Journal of Applied Psychology 


Joun G. Dartey, Editor 
University of Minnesota 





Table of Contents 


Preference for New Products and Its Relationship to Different Measures of Conformity: 
Walter Gruen 


Effects of Target Width and Crosshair Width on Tracking Performance: David Holstein. ... 365 

A Comparison of the Effects of Training and Secondary Tasks on Tracking Behavior: W. D. 

Multifinger Tapping Performance as a Function of the Direction of Tapping Movements: 
Lyle R. Creamer and Don A. Trumbo 

An Objective Validation of Factual Interview Data: David J. Weiss and Rene V. Dawis.... 381 


Motivation in Management: A Study of Four Managerial Levels: Hjalmar Rosen and Charles 


Four Semantic Rating Scales Compared: William D. Wells and Georgianna Smith 


Predicting Ratings of Sales Success with Objective Performance Information: Wayne K. 
Kirchner 


Multidimensional Scatterplotting: A Graphic Approach to Profile Analysis: Bernard Rimland. 404 


The Validity of Behavioral Rating Scale Items for the Assessment of Individual Creativity: 





This is the last issue of Volume 44, 
Volume Title Page and Contents appear herein. 





American Psychological Association 


Volume 44, Number 6 December 1960 





Consulting Editors 


Harotp E. Burtt, Ohio State University 


AtpHonse CHapanis, Johns Hopkins Uni- 
versity 

Currrorp E. JurcENSEN, Minneapolis Gas 
Company 

Laurence S. McGaucuran, University of 
Houston 


Quinn McNemar, Stanford University 


ALEXANDER Mintz, City College of New 
York 

Harotp F. Rotue, Fairbanks, Morse and 
Company 

Juxian B. Rotter, Ohio State University 

Tuomas A. Ryan, Cornell University 

Donato E. Super, Columbia University 

Mites A. TInKER, University of Minnesota 

Atrrep C. WELCH, University of New 
Mexico 





This journal gives primary consideration to origi- 
nal investigations in any field of applied psychol- 
ogy except clinical psychology, although a de- 
scriptive or theoretical article may be accepted 
if it represents a specia! contribution in an ap- 
plied field. Quantitative investigations of inter- 
est or value to psychologists working in the fol- 
lowing broad fields will be considered: vocational 
and educational prognosis, diagnosis, and guidance 
at the secondary and college level; personnel re- 
search in business, industry, and government; en- 
gineering psychology; industrial working condi- 
tions; research on opinion and morale factors; job 
analysis and classification research; market and 
advertising research. 


Because of the large number of manuscripts sub- 
mitted, authors should adhere to the rule of 
“brevity consistent with clarity.” The typical 
manuscript should run to approximately 4,000 
words. There is a lag of approximately twelve 
months between receipt and publication of an 


article. Authors may request advanced publica- 
tion if they are prepared to pay the cost of print- 
ing the necessary extra pages. 


Kenneth E. Clark will become Editor of the 
Journal of Applied Psychology, effective January 
1, 1961. Because of the publication lag in this 
Journal, manscripts from now on should be sub- 
mitted to the Editor-elect: 


Dr. Kenneth E. Clark 

Office of the Dean 

College of Arts and Sciences 
University of Colorado 
Boulder, Colorado 


All manuscripts must be submitted in duplicate. 
Original figures are prepared for publication; dupli- 
cate figures may be photographs or pencil-drawn 
copies. 


Manuscripts must conform to the style require- 
ments described in the Publication Manual of the 
American Psychological Association. 





Journal of Applied Psychology 


Published bimonthly by the 
American Psychological Association 
Prince and Lemon Sts., Lancaster, Pa. 
and 1333 Sixteenth Street N.W. 
Washington 6, D. C. 


$8.00 per volume 


ArtHur C. HorrMan 
Managing Editor 


$1.50 per issue 


Heten Orr 
Promotion Manager 


Subscriptions, orders, and business communications should be addressed to the American Psychological Association, 
1333 Sixteenth St. N.W., Washington 6, D. C. Address changes must reach the subscription office by the 10th of 


the month to take effect the following month. 


Undelivered copies resulting from address changes will not be replaced; 
subscribers should notify the post office that they will guarantee second-class forwarding postage. 


Other claims for 


undelivered copies must be made within four months of publication. 


Second class postage paid at Lancaster, Pennsylvania and at additional mailing places. 


© 1961 by the American Psychological Association, Inc. 





Journal of Applied Psychology 








VoL. 44, No. 6 


DECEMBER 1960 








PREFERENCE FOR NEW PRODUCTS AND ITS RELATIONSHIP 
TO DIFFERENT MEASURES OF CONFORMITY 


WALTER GRUEN ! 


Beth Israel Hospital, Boston 


From the writings of Whyte (1956), Fromm 
(1955), and others, the emerging picture of 
the middle class American is that of a person 
who is sensitive to those around him, who 
wants to keep up-to-date and deny himself 
any behavior which might be at ance with 
that of his peers. He accomplishes this pro- 
gram by adapting his attitudes as well as his 
consumer habits and other actions, so that he 
does not stand out as a deviant. He 
comes sensitized to the mass media of com- 
munication which tell him what is the ac- 
cepted and latest style in spending his time 
and his money. According to these authors 
and others, being up-to-date becomes an im- 
portant goal and may manifest itself in buy- 
ing the latest and newest 
This pattern is supposed to be so widespread 
that the advertisers, who are sometimes ac- 
cused of being the originators of it, extol the 
virtue of new products and of the latest styles, 
and thereby induce the middle class consumer 
to exchange his older products for the shiny 
new ones every few years. The designation 
“Built-in Obsolescence” conveys the psycho- 
logically induced dislike for a styled product 
which is several years old and no longer simi- 
lar in appearance to more recent stylings of 
essentially the same product. 

With some misgivings the hypothesis was 
therefore entertained that some people prefer 
new styles and new exteriors regardless of the 
type of product, and that these people were 
a homogeneous group by virtue of their mem- 
bership in the species of conforming and self- 
negating “Organization Men.” All that re- 
mained was to test for reactions to new versus 


also be- 


consumer goods 


1 Formerly at the University of Buffalo. 


361 


older styles in a variety of products, find the 
percentage of people who reacted favorably 
to newness; and then subject them to a couple 
of measures of conformity for an analysis of 
variance. 


METHODS AND PROCEDURES 


rhirty slides in 10 groups of three were prepared 
by collecting three pictures of different styles or pat 
terns from 10 different kinds of contemporary prod- 
ucts. All 10 different kinds of products were known 
to have undergone of style changes over 
the past The three pictures selected for each 
product were either from the same catalogue or from 
the same magazine, so that color reproduction, la) 
or size was fairly evenly matched for each trio 
Thus there were three each of small foreign cars, 
silverware patterns, dinnerware, spring suit ensem- 
bles, dresses, living room furniture, office furniture, 
project house models, labels from canned soups, and 
labels from instant coffees. One slide from each trio 
labeled a 1950 1° pattern of 
company 
manufacturing the particular product. A second slide 
of each trio was labeled the 1957-59 product of the 


3 


a history 
years 


out, 


was variously as to 56 


so-and-so company, usually a well-known 


t 


same company, so that there was at 
difference between each pair, depending on the type 
of product shown. The varied purpose- 
fully in order to avoid any suspicion that the slides 
were shamelessly rigged. The member of each 
trio was labeled as a product of a different company 
without the benefit of a date. This procedure was 
used to prevent a guess that the dates in themselves 
were important to us 

Hence there were 10 slides of relatively new prod 
ucts, 10 slides of products several years older, and 
10 so-called neutral slides. They were scrambled and 
presented in a random order. For the coffees the 
“new and improved” Maxwell House coffee label 
used on the older label which is 
identical with the new label except for the omission 
of the “newness” message was used for the second 
slide, and a different brand of instant coffee consti- 
tuted the label for the third slide. For the soups the 
“new and improved” Campbell Chicken Noodle soup 


least a year 


years 


were 


third 


was one slide, 





362 Walter 


and the older similar label were used as well as a 
label from a beef noodle soup from a different com- 
pany. For both beverages the Ss were served 2-ounce 
samples of three coffees and of three soups randomly 
mixed in with the other 24 slides. For the new and 
old coffees and soups they were given two servings 
of the actual new product, for the control slides the 
product shown was actually served. One of each of 
the two new/old servings for soup and coffee con 
tained a few drops of a United States certified colo 
to make it look slightly different. 

In order to disguise the intent of the experiment 
even further, and to obtain a more sensitive measure 
of the reaction than is possible by asking for prefer 
ence of one product over another, each slide was 
presented separately and the S was required to check 
14 bipolar adjective dimensions from Osgood’s se 
mantic differential (1957) for each These 14 
seven-point included 9 defining the original 
three factors found by Osgood as well as 5 more 
defining two additional factors found in the exhaust 
ing study of consumer preferences done by Wright 
(1958). For the present study only the three scales 
defining the evaluative factor were used 

A number of additional precautions were taken to 
control for various sources of error. For one-half of 
the Ss the dates on the crucial 10 pairs of slides were 
reversed (or the color on the two beverages changed 
to the other of the pair). A comparison of the two 
groups showed that each member of the pair was not 
more preferred when it was either old or new, ex 
cept as will be shown in the breakdown of reactions 
below. The Ss were furthermore instructed to 


slide. 
scales 


judge each object on the slide, or each food you 
sample according to how you feel about it now, 
not according to what you know about it. That 
is, let your feelings about what you see or taste 
here be your guide, not what you may know or 
think about the object. For instance, if 
shown a steam locomotive, don’t react to what it 
is made of or to the engineering of it or to what 
it can and cannot do, but to the picture we show 
you here 


you are 


This was done to prevent their reaction to the engi 
neering or production improvement in some newer 
products. Actually 30 Ss were run without these in- 
structions and their mean preference scores did not 
differ from those taken from the remaining Ss under 
the above-quoted instructions. All Ss were finally 
asked to take a guess about the purpose of the slide 
presentation to if anyone suspected our ruse 
Only one S vaguely felt that the dates were som¢ 
how supposed to get at different reactions. Four of 
the male Ss were sports car enthusiasts and were 
able to point out that our car labels in two of the 
three car slides were wrong. This defection was pri 
vately explained to them as a last-minute emergency 
measure, and did not affect their belief in the genu 
ineness of the oiner labels as shown by their post 
test reactions 

The 114 Ss, all students, 
were tested in four groups of roughly equal size. At 


see 


introductory psychology 


Gruen 


the beginning of each session use of the semantic 
differential was explained to them. They were then 
asked to check 14 scales for each of 30 slides in a 
booklet handed to them. It contained one page of 14 
dimensions for each of the 30 slides. The order of 
the 14 was randomly selected for each slide, so that 
no two pages were alike. The slides were then shown 
one by one until everyone in the room had checked 
an answer on all of the 14 scales. At the presenta- 
tion of each of the six beverage slides, sample cups 
were handed out which the Ss were supposed to 
taste before checking their reactions 

The dependent variables followed the 45-minute 
slide presentation and consisted of the following tests 
relevant here 

An other/inner-directed consisting of 20 
items and developed by Gross (1959), was adminis- 
tered first. Each of the 20 items consisted of a 
pair of statements, one denoting inner-direction, one 
other-direction as these terms are used by Riesman, 
Glazer, and Denny (1950). Both members of a pair 
were equated for social desirability according to the 
methods used by Edwards 
posed to check the one he 
example of an item pair is: “A person has to be able 
to sell himself” (other-directed), and “I have clear- 
cut in life’ (inner-directed). The scale has a 
split-half reliability of .60 and a test-retest reliability 
of and correlated between 50 with meas- 
ures indicating the degree of switch to peer opinion 
in a perceptual judgment task in 
high school seniors 


scale, 


(1953). The S was sup- 


“agreed with more.” An 


goals 


63, 30 and 


a large sample of 


The second scale was the Behavior Interpretation 
Inventory developed Moeller and Applezweig 
(1957). Their questionnaire represented a ‘method of 
measuring and conceptualizing motives based on the 
Murray and Maslow need systems. The questionnaire 
was multidimensional, insofar as it yielded scores on 
the relative strengths of four broad 
tives, of which were of interest 
items of a statement about a per- 
son’s behavior, and the S ranked the four alterna 
tive each behavior description in the 
order of the likelihood of their being the causes for 
the One the four choices 
was social approval or con- 
formity, another for reasons of self-approval or so- 
called inner-directedness. Hence social approval and 
self-approval scores were possible and 
puted for each S 


by 


needs or mo 


here. Each of 


two 


its 59 consisted 


reasons for 


behavior in question of 


always for reasons of 


were com- 


\ third measure was a new and experimental ver- 
sion of an American core culture inventory, which 


designed to measure agreement with attitudes 
and values supposedly typical of American core cul- 
ture or American common man culture, as these con- 
cepts have been developed by Loeb (1956) and by 
Warner, Meeker, and Eels (1949), respectively. The 
various items consisted of value statements which 
the S had to check on a seven-point scale of agree 
ment/disagreement. The construction, reliability, and 
validation of the scale will be described in a future 
publication awaiting data from other countries 

In addition to these scores, 32 of the Ss had also 


was 





Preference for New Products and Conformity 


been in an experiment which involved exposure to 
the Asch conformity situation (1952) with the elec- 
trical modification as proposed by Crutchfield (1955) 
with ‘automatic equipment. Eighty-seven Ss partici- 
pated in an experiment which used a variation of 
the Asch slides as developed by Feldman and others 
(Feldman & Bierman, 1959; Feldman & Goldfried, 
in press). Here the response lines on the slides con- 
tained two equal to the standard and one unequal 
The electrical stooges picked one of the two correct 
lines in the 12 crucial trials, and the conformity score 
for each S consisted of the number of times he 
picked the exact correct which had 
been “chosen” by the stooges. 


Same answer 

Evaluation factor scores were computed for each 
S by summing his three responses on the three seven 
point evaluation scales for each slide. The evalua- 
tion factor scores for each pair in the 10 pairs of 
subtracted from each other to see 
if he preferred the newer or the older product, or 
showed no preference. The 114 Ss were then divided 
into five groups, 


dated slides were 


those preferring sever. or more of 
the newer products out of the possible 10 pairs, 
those preferring six of the newer products, those 
preferring a majority (six or more) of the older 
products, those having rqughly an equal number of 
choices for old and new products, and a 
group who showed four or mor 


special 
preterences 
(that is, having identical reactions to the two prod 


zero 


Analysis of variance 
was then used for these five groupings with each of 


ucts in four or more instances) 
the five dependent variables 
RESULTS 
Out of 114 Ss only 15 preferred seven or 
more of the new slides out of a total of 10 
such pairs, 16 more preferred six new slides 
and preferred only three or less older ones 
17 Ss preferred six or more old slides, while 
51 had roughly an equal number of choices 
in either direction, and 15 more also showed 
equal preference but indicated no preference 
in four or more of their comparisons. If one 
combines the last two groups, there are 66 Ss 
or 58% who show no particular preference for 
either older or newer products as labeled here, 
while only 13% show a definite preference for 
newer products and another 15% prefer older 
products to a greater or lesser degree. These 
results indicate that there is no general pref- 
erence for “newness” in a sample of college 
students. Reactions to each of the two poles 
of this dimension in terms of preference are 
limited to a small minority of students. 
When analyses of variance were computed 
for the five group means on each of the 
dependent variables, no significant F ratios 
emerged. A computation of ¢ tests between 


363 


the group preferring new products and the 
group preferring old ones or the group which 
had no clear-cut preference was equally un- 
rewarding. Hence one must conclude that the 
two groups who show either a strong prefer- 
ence for new or for old products are homo- 
geneous with regard to this measure, but not 
with regard to the various indicators of con- 
formity or other-directedness. The groups may 
be homogeneous in terms of other dimensions 
not included here. 
DISCUSSION 

First of all we would like to remind the 
reader of the striking fact that less than one- 
quarter of the Ss showed a clear-cut prefer- 
ence for “newer” products. (Only three out 
of the 17 Ss who generally preferred “older” 
products did so 7 out of 10 times.) The over- 
whelming majority of the Ss seemed uninflu- 
enced by the date labels in making their in- 
dividual judgment about each slide. One of 
the implications one cannot escape here is 
that the reactions to a variety of consumer 
goods, at least to their visual, two-dimensional 
representation, was not based on how up-to- 
date the article was. Perhaps, as Riesman 
et al. (1950) have already intimated, the ad- 
vertiser has overshot the mark by stressing 
too frequently the newness of his product as 
a sales appeal. Perhaps the results herald a 
reaction against the ever changing parade of 
products in new clothing and against the ad- 
vertiser’s claim that the newest style is neces- 
sarily the best and the one worth having. 
Perhaps the present generation will mot reach 
for a product whose appeal is primarily based 
on its newness but is becoming immune to its 
ever increasing emphasis so that preferences 
are based on other, maybe more individual- 
istic criteria. 

Of course we should not forget that the age 
differential between the products shown here 
was not more than a few years. One might 
argue that a more clear-cut preference for new 
products might have emerged if the “older” 
products had been put back further into an- 
tiquity. However, the experiment was not de- 
signed to test preference for contemporary 
versus “‘antique” objects. The 5-year age dif- 
ferential used here is generally assumed to be 
the difference between up-to-date and out-of- 





364 


date products, and this difference was investi- 
gated here. 

A caution must be inserted about the gen- 
eralization of the results from 17-22 year old 
college students to the general population. It 
could be argued that our suburban consumer 
—once out in the “real” world, making a liv- 
ing and becoming consumer and judge of these 
products—might then show a preference and 
do so by virtue of this trend to conform. Only 
another experiment with adults in the sub- 
urban milieu can settle this problem. It re- 
mains a little hard to believe, however, that 
the college student should change so radically 
in his preference criteria a few years from 
now, especially if we remember that he is al- 
ready exposed to the same general culture as 
is the suburban consumer, and that he moves 
around on a college campus which also tends 
to have a conformity producing subculture. 


SUMMARY 


The hypothesis that there may be a wide- 
spread preference for newer consumer goods 
regardless of type of product, and that such 
preference might be an indication of general- 
ized conformity was investigated by showing 
10 pairs of slides of a variety of products. 
One member of the pair was labeled 1957, 
1958, or 1959, the other was labeled about 
5 years older. Ten undated control slides 
alternated with the random presentation of 
the other 20. The Ss were also given a scale 
measuring other-directedness, a behavior in- 
ventory which measured the degree to which 
they attributed personal or social reasons to 
a variety of behavior descriptions, and a scale 
measuring acceptance of so-called American 
core culture values. Some of the Ss were also 
subjected to the Asch conformity situation. 

The Ss consisted of 114 undergraduate col- 
lege students who reacted to the product 
slides by checking three evaluative adjective 
dimensions from Osgood’s semantic differen- 
tial. Twelve percent ofthe Ss preferred 7 or 
more of the 10 “new” products, which raises 


Walter Gruen 


the question regarding the effectiveness of 
“newness” as an advertising device and as 
a criterion by which the contemporary gen- 
eration judges a product. Furthermore, none 
of the other measures of conformity distin- 
guished the 12°% or the 15° who more often 


preferred “older” products, thus making it ap- 
pear that the homogeneity of judgment in 
about one-quarter of the Ss was based on 
other causes than on conformity. 


REFERENCES 

Ascu, S. E 
Hall, 1952 

CRUTCHFIELD, R. S. Conformity and character. Amer. 
Psychologist, 1955, 10, 191-198 

Epwarps, A. L. The relationship between judged de 
sirability of a trait and the probability that the 
trait will be endorsed. J. appl. Psychol., 1953, 37, 
90-93. 

Fe_pMAN, M. J., & Bierman, R. The 
ward on conformity behavior 
1959, 14, 334. (Abstract 

FeLpMAN, M. J., & Gotprrien, M. R. Validation of 
group judgment as a factor affecting independence 
and conforming behavior. Sociometry, in press 

Fromm, E. The sane society. New York: Rinehart, 
1955. 


Social psychology. New York: Prentice- 


effect of re- 
imer. Psychologist, 


Gross, H. The relationship between insecurity, self 
acceptance, other-direction and conformity under 
conditions of differential social Unpub 
lished doctoral dissertation, University of Buffalo, 
1959. 


pressure 


Unpub- 
lished doctoral dissertation, University of Chicago, 
1956 

Moetter, G., & Apprezwetc, M. H 
factor in conformity. J 
1957, 55, 114-120 

Oscoop, C. E., Suci, G ie & 
The measurement of 
Il!inois Press, 1957 

RiesMAN, D., Grazer, N., & Denny, R. The lonely 
crowd. New Haven: Yale Univer. Press, 1950 

Warner, W. L., Meeker, Marcnia, & Eetts, K. So- 
cial class in America Science Research 
Associates, 1949. 

Wuyte, W. H., Jr. The organization 
York: Simon & Schuster, 1956 

Wricut, B. (Ed.) A semantic differential and how 
to use it. Chicago: Social Research, Inc., 1958 


Lors, M. Social class as evaluated behavior 


A motivational 
abnorm. soc. Psychol., 
TANNENBAUM, P. H 


meaning. Urbana: Univer. 


Chicago: 


man. New 


(Received January 28, 1960) 





Journal of Applied Psychology 
1960, Vol. 44, No. 6, 365-369 


0. 


EFFECTS OF TARGET WIDTH AND CROSSHAIR WIDTH 
ON TRACKING PERFORMANCE 


DAVID HOLSTEIN ! 


International Business Machines Corporation, Bethesda, Maryland 


In the design of a control system for a per- 
ceptual-motor task, i.e., visual tracking, equip- 
ment factors such as the physical nature of 
the control, control system dynamics, the 
type and size of crosshair, etc. are taken into 
consideration. It is also generally thought that 
accuracy in tracking is somewhat related to 
the degree of refinement in the size and con- 
tour cues provided by the target and the 
Previous related studies (Bilodeau, 
1952b; Helmick, 1951; Lincoln & 
Smith, 1952; Morin, 1951) have investigated 
the target width variable, holding crosshair 
width constant, and have found relatively 
little change in performance for the target 
widths studied. However, the possibility that 
these two factors, target width and crosshair 
width, interact to affect tracking performance 
has been overlooked. Therefore, the purpose 
of this study is to investigate this problem 
of perceptual discrimination in tracking by 
studying target width and crosshair width 
simultaneously. More specifically, the follow- 
ing hypothesis was tested: when the ratio of 
target width to crosshair width equals 1, per- 
formance would maximum; however, if 
this ratio were much greater than 1, 
than 1, there would be a performance decre- 
ment. The basis for this hypothesis is that 
when the ratio is close to 1 there would be a 
minimum of interpolation. Interpolation is 


Cursor. 


1952a, 


be 


or less 


defined as the process of estimating the po- 
to the 
following specific 


sition of a point relative reference 
within which it falls. The 
questions are studied in 
What effect the target 
width and crosshair width have on tracking 


this experiment: 


does interaction of 


performance with respect to time and ac- 
1 The suggestions and criticisms of C. A. Bennett 
and the assistance of A. J. Pratt, R. E. Kent, and 
C. V. Deats in the execution of the experiment 
gratefully acknowledged. This 
while at the Human Fa 
Owego, New York 


art 
was performed 


IBM, 


stud, 


ctors Wepartment, 


365 


curacy? What effect do target width and 
crosshair width have independently? 


PROCEDURE 


{pparatus and Task. The Optical Comparator, a 
device developed to simulate airborne radar displays 
in order to perform controlled human factors ex- 
periments, was employed in this study. It provided 
control of the variables under investigation 
and allowed recording of time and error scores. The 
principal components were 35-mm. projection equip- 
ment and a ground glass screen 10” in diameter, on 
which stimulus materials could be projected. The 
S’s control was of the joystick type and capable of 
moving the stimulus materials at a constant rate of 

wi and governed the horizontal and vertical 
novements of the target. The E had controls which 
governed the initial displacement (error to be cor 
of the stimulus materials, the counters which 

the time and the horizontal and 
components the error. The crosshairs 
were drawn on clear plastic and superimposed on 
the ground glass screen. The targets (circles) were 
drawn, photographically reduced, converted to 35- 
mm. slides and optically projected on the 10” ground 
glass screen. The variables studied were: (a) target 
width ; 4”, 4”, 4”, 1”; and (b) crosshair 
width a. a eee 


8 


close 


eC 


rected ) 


scores, 


recorded 


vertical ol 


lw 
1” La 
64 


{4 compensatory tracking task was used in this 


study. Compensatory tracking is defined as one in 
which an operator is presented with a display con- 
sisting a target and a zero reference point (on 
indicator) and is required to maintain the indicator 
the target by compensating for the movements 
of the target imposed upon it by outside influences 
Thus when the operator sees a discrepancy between 
a movable target and fixed crosshairs he displaces 
his control in order to bring the target back under 
the crosshairs 

After the E set in a random target error in direc- 
tion and magnitude, the stimulus situation presented 
to the S was a display with fixed centered crosshairs 
and a target displaced from the center of the cross 
hairs. The S’s task then was to correct this discrep- 
ancy by manipulating his control such that the tar- 
get would move under the fixed crosshairs. The S 
could move the target back and forth under the 
crosshairs and could stop the movement of the tar- 
get at any time. The procedure in more detail and 
the S’s task for a typical trial are as follows: after 
the E chose a particular condition according to a 
predetermined design and set in a random target 
error in direction and magnitude, the S started 


of 


on 


a 





David Holstein 


TABLE 1 


SUMMARY OF THE ANALYSIS OF VARIANCE PERFORMED 
ON THE ABSOLUTE ERROR SCORES 


Source of 


Variance 


4.15* 
41.216* 
3.939* 


Target 
Crosshair 
TxXC 
Wit! 


404.0506 
611.7989 


Cells 


Total 


Note The Error Term used was the Within Cells MS 
* Significant at .001 level. 


trial by throwing a switch which turned on the dis- 
play and energized the timing mechanism. The S 
then corrected the error so that the target was cen- 
tered under the crosshairs. When the hand control 
mechanism stopped and 
the 


released the 
the switch 
stopped. 
Experimental Design. A two-dimensional factorial 
design was used whereby each S performed under all 
36 conditions twice. This provided control of S$ vari 
The 10 Ss used were an incidental sample of 


was accuracy 


when was closed timing mechanism 


ance. 
engineering personnel, all having served as Ss in 
similar experiments. Since there were 36 conditions 
(6 target widths and 6 crosshair widths), 10 Ss, and 

replications of the 36 conditions per S, each con- 
dition was repeated 20 times. There were two sessions 
36 


per S on succeeding days, each consisting of the 


conditions presented in a random order. 


©-TARGET WIDTH (INCHES) 
4-CROSSHAIR WIDTH (INCHES) 


ABSOLUTE ERROR (INCHES) 





Rescues 





/16 vs V4 2 
\/32 Vi6 v8 \/4 


1/32 
164 


i—o 
/2—4 
1. Absolute error as a function of both target 
width and crosshair width 


RESULTS 
Error Scores 

Table 1 shows the summary of the analysis 
of variance performed on the absolute error 
scores. Absolute error is defined as the differ- 
ence between the center of the target to the 
center of the crosshair. The crosshair and tar- 
get width main effects and the target  cross- 
hair interaction were significant to the .001 
level of significance. 

Figure 1 shows the mean absolute errors 
summed across subjects as a function of target 
width and crosshair width. The error score 
shown is the hypotenuse of the triangle 
formed by the horizontal and vertical compo- 
nents of the error. The results show that, in 


rABLE 2 


SUMMARY OF THE ANALYSIS OF VARIANCE PERFORMED 


ON THE RECOVERY TIME SCORES 


source ol 
Variance WS 

78.604 
160.764 
69.984 
81.945 


Target 
Crosshair 
TXC 
Within Cell 


Total 
Within Cells MS 
general, as the crosshair width increases, er- 
rors increase, and that there is a tendency for 


a decreased error with large target widths. 


Time Scores 


Table 2 shows the summary of the analysis 
of variance performed on the recovery time 
Recovery time is defined as the time 
elapsed between the start of a trial and the 
end of the final corrective control movement. 
The results show that neither the two main 
effects nor the interaction was significant. 


scores. 


Figure 2 shows the mean recovery time 
summed over subjects as a function of target 
width and crosshair width. Although these re- 
sults are not statistically significant there is 
indication that (a) as target width in- 
creases, recovery time decreases; (5) per- 
formance seems to be best in the region of 


an 





Target and Crosshair Width in Tracking Performance 


1” 1 


33 “16 


formance if the crosshair is either increased 


” crosshair widths, with decreasing per- 


or decreased. 


Interaction Effects 

Figure 3 shows absolute error as a function 
of the target width to crosshair width ratio 
(T/C) for the “visual” and “nonvisual” con- 
ditions. The visual conditions are defined as 
T/C is than | 
the nonvisual conditions are defined as those 
where T/C The 
latter conditions are called nonvisual because 


those where greater while 


is equal to or less than 1. 


O-TARGET WIDTH (INCHES) 
4-CROSSHAIR WIDTH (INCHES) 


cS 


_ 


n” 
Z 
oO 
H 8 
Ww 
= 
- 
> 
x 
ua 
> 
2) 
s) 
sd 
c 








1/32 16 8 
164 1/32 16 1/8 


time as a function 
width and crosshair width 


Recove ry 


the crosshairs are larger than the target and 


therefore, the subject cannot see the target. 
The results show that the error remains con- 
stant for all the visual conditions. However, 
for the nonvisual conditions, error increases 
as the ratio decreases from 1. 

Figure 4 shows recovery time as a function 
of T/C 
conditions. The results show that there is an 


ratios for the visual and nonvisual 
optimum range for visual conditions between 
the ratios of 8 to 16 with recovery time in- 
creasing if the ratio is increased or decreased. 
time 


For the nonvisual conditions, recovery 


increases as the ratio decreases from 1. 


O- VISUAL 


4&-NON-VISUAL 


ERROR (INCHES) 


L 
o 
w 


ABS 








ARGET WIDOT 
ISSHAIR WIDTH 


error as a function of T/C 
study 


Absolute 
visual and nonvisual tasks of the 


DISCUSSION 
Error Scores 


In general, if the mean absolute errors 
shown in Figure 1 are examined, the following 
relationships may be seen. Summed over all 
crosshair widths, absolute error tends to de- 
crease as target width increases. Table 1 and 


O- VISUAL 
4- NON- VISUAL 


= & 

1/2444 —1/8 ———— i/ I6 
TARGET WIDTH 
CROSSHAIR WIDTH 


Fic. 4. Recovery time as a function of T/C for the 


visual and nonvisual tasks of the study 





368 David 
Figure 1 show that crosshair width has a sig- 
nificant effect on tracking accuracy. In gen- 
eral, the finer the crosshair, the more accurate 
the tracking is. Figure 1 shows that accuracy 
would not increase appreciably by reducing 
crosshair width past +y”. 

Figure 1 seems to indicate that absolute 
error decreases as target width increases. 
However, when a closer look is taken at the 
means for every combination of conditions, 
this generality does not hold because an in- 
teraction exists between crosshair width and 
target width. Table 1 shows that this inter- 
action is significant at the .001 level of sig- 
nificance. The following effects can be distin- 
guished: 

1. The optimum condition in most cases 
occurs when T/C equals 2. If the ratio is in- 
creased there is an indication that absolute 
error increases although this effect is not pro 
nounced, 

2. When T/C is less than 1, absolute error 
increases with decreased ratios. One explana- 
tion of this effect is that when the crosshair 
is larger than the target, the subject cannot 
see what he thinks is the true center. He not 


only has to interpolate between the edges of 
the target but has to perform this task non- 
visually. Figure 3 shows that nonvisual inter- 
polation produces greater error than visual 


interpolation. A previous study (Holstein, 
1957) which investigated the relationship be- 
tween visual and nonvisual interpolation, has 
also shown that the magnitude of error is ap- 
preciably greater in nonvisual interpolation 
compared to the visual counterpart. 

3. When T/C is greater than 1, error re- 
mained at a constant low level. 

In general, the error data tend to indicate 
that in visual interpolation, error tends to re- 
main constant as target width increases. In 
nonvisual interpolation, errors increase at a 
very fast rate as T/C decreases from 1. This 
effect is so gross that when target width was 
summed over crosshair width, it appeared 
that error with increased target 
widths. 


decreased 


Recovery Time Scores 
An analysis of variance performed on the 
recovery time scores indicates that the target 


Holstein 


width and crosshair width main effects and 
their interaction are not statistically signifi- 
cant factors. However, when a closer look is 
taken at Figure 2, some trends are evident. 
There is a tendency for decreased recovery 
time as target width increases, and an indica- 
tion of an optimum crosshair width within 
the, range of 3/5” and ;',”. Figure 4 shows that 
for the visual conditions, an optimum is pres- 
ent between T/C of 8 and 16. However, with 
increased or decreased ratios, recovery time 
scores increase. For the nonvisual conditions, 
decreasing T/C from 1 results in increased 
time scores at a very fast rate. 


CONCLUSIONS 


This study investigated tracking perform- 
ance as a function of crosshair width and 
target width. The following can be concluded: 

1. There is a significant interaction be- 
tween crosshair width and target width for 
error scores. 

2. A significant crosshair effect was evident 
for error scores with an optimum crosshair 
width between 3/5” and ,y”. The recovery 
time scores indicated that performance was 
best within the above range. 

3. The target width variable was signifi- 
cant, with increased target widths resulting 
in lower absolute errors and an indication of 
reduced recovery time scores. 

4. For the 
of target width to crosshair width increased, 
absolute 


“visual” conditions, as the ratio 


error remained constant, while an 
optimum was reached for recovery time be- 
tween ratios of 8 and 16. 

5. For the “nonvisual’’ conditions, decreas- 
ing T/C below 1 resulted in increasing error 
at a very fast rate. 
alignment 


6. If extremely high-precision 


(accurate to ten-thousandths) is required 
and virtually unlimited time is available, a 


should be 
condition of approxi- 


variable crosshair width control 


used. The “optimum” 


mately a T/C of 2 should be used. In most 


instances of lower precision requirements, 
1 


however, a crosshair size between »};” and ;',” 
a2 1 


is optimum. 





Target and Crosshair Width in Tracking Performance 


REFERENCES 


Biropeau, E. A. A further study of the effects of 
target size and goal attainment upon the develop 
ment of response accuracy. USAF Hum. Resour 
Res. Cent. res. Bull., 1952, No. 52-7. (a) 

Brropeau, E. A. A preliminary study of the effects 
of reporting goals as a function of different de- 
grees of response accuracy USAF Hum. Resour 
Res. Cent. res. Bull., 1952, No. 52-4. (b) 

Hetmick, J. S. Pursuit learning as affected by size 
of target and speed of rotation. J. exp. Psychol. 


1951, 41, 126-138 


369 


Relationship between visual and non 
thesis 


Hoste, D 
visual interpolation. Unpublished master’s 
Lehigh University, 

Lincoin, R. S., & Smitn, K. 1 
of factors determining accuracy 
Science, 1952, 116, 183-187 

Morin, R. E training 
tasks varying in precision of 

Amer. P 


1957. 
Systematic analysis 


in visual tracking 


Transfer of between motor 
movement required 
1951, 6, 390. (Al 


} 
yvchologist, 


to score. 
stract) 


(Received January 16, 196 





Journal of Applied Psychology 
196 Vol. 44, No. 6, 370-375 


A COMPARISON OF THE EFFECTS OF TRAINING AND 
SECONDARY TASKS ON TRACKING BEHAVIOR 


W. D. GARVEY ! 


United States Naval Research Laboratory 


In a previous study (Garvey & Mitnick, 
1957) an analog computer model was em- 
ployed to describe the behavior of a human 
operator in a learning situation. It was found 
that as learning progressed the describing 
model altered in its terms. The present study 
seeks to extend the use of this model to situa- 
tions in which the operator is forced to per- 
form secondary tasks while tracking. 


APPARATUS AND PROCEDURI 


A simplified block diagram of the 
shown in Fig. 1. The tracking task 
tory, ie., S sought to keep a dot of 
tube centered on a hairline by manipulating a pres 
sure stick with a rate control. A force of 500 gr. on 
the stick produced a rate of 10 cm./sec. of the dot 
on the scope. 

The dot 


apparatus is 
was compensa 
a cathode ray 


forced off the hairline by either 
constant rate, acceleration, or rate of 
celeration (jerk) input. The graphs in Fig 
umplitude-time plots of the fast input functions used 
Slow input functions were also used; for slow inputs 
only 50 v. were accumulated in 60 sec. The 
tivity of the display was such that 1.0 v. produced 
0.5 cm. displacement of the dot on the 
as Ss. The 


was 
change of ac 
show 


s¢ nsi 


S( ope. 


Six naval enlisted men served general 
experimental design consisted of six Ss, six conditions 
trials per condition 
square design. This 


throughout 


(system inputs), and six 60-sec 
counterbalanced in a 6 X 6 X 6 
design was permuted from day to day 
the duration of the experiment, so that each S had a 
different ordering of the six 60-sec. trials under each 
condition each day 

I error scores 
trials 
absolute 


Performance was measured in terms ¢ 
integrated during the last of the 60-sex 
Two 
error, which is 


55 sec 
were recorded (a) 
a measure of tracking error without 
regard to whether the output leads or lags the in 
put, and (b) lead or lag error, which is a 
of whether the output leads or lags the input 

After 
erate under a series of secondary 
degrade performance. Each of 
ary tasks was given on a separate day 


types of scores 


measure 


were required to op 
tasks designed to 
the following second- 


25 days of training Ss 


with a dif 


ferent counterbalancing of Ss and system inputs 


1 The author wishes to thank F. V. Taylor for his 
valuable suggestions in preparing the manuscript and 


to express his appreciation to J. B. Henson, who ran 


Ss and assisted in the analysis of the data 


1. Arithmetic task. The Ss were required to solve 
two-digit subtraction problems at a rapid pace while 
tracking. 

2. Visual task. The Ss had to detect and 
the range and bearing of targets on a simulated rada 
scope while tracking 

3. Motor task. The Ss tracked with the right hand 
while simultaneously performing a push-button task 
with the left hand. The task consisted of pressing 
one of five buttons corresponding to one of five light 
stimuli located on a display above the tracking scope 
Stimuli were presented at a rapid rate and in a ran 
dom order 


report 


RESULTS 
{hsolute Error Scores 


relative to the ab- 
solute error scores are noteworthy (see Fig. 
3). First, the absolute error measurements did 
not discriminate the inputs very effectively. 
During the early stages of training the scores 


Three general findings 


resulting from the rate inputs were found to 
be reliably lower than those associated with 
other inputs; no other comparisons were 
found to be reliable. During the final stages 
of training none of the comparisons were 
found to be reliable. Under the secondary 
tasks the rate inputs were found to yield re- 
liably lower scores than the other inputs. 
Second, the result of training was to reduce 
the error scores associated with all inputs 
Third, the tasks the 
error scores to a level significantly above that 
found during the early stages of practice 


secondary increased 


Lead-Lag Error Scores 


Figure 4 shows idealized time plots of lag 
scores which would be obtained under differ- 
ent circumstances. (The graph illustrates this 


The statistical methods employed in the analyses 
data were nonparametric tests. Wilcoxon’s 
was used to test the differences between 
two conditions and Friedman’s test (1937) was used 
for the comparison of several conditions. Any differ 
ences reported as reliable in this section were found 


the 5% level of con 


th 
nese 


test (1949) 


to be statistically significant at 


fidence or less 


70 





Effect of Training 


on Tracking Behavior 








ELECTRONIC ERROR 
INTEGRATOR 
(WITHOUT REGARD TO 
DIRECTION OF ERROR) 














BROWN 
POTENTIOMETER 
RECORDER 
(WITH REGARD TO 
DIRECTION OF ERROR) 











HUMAN OUTPUT 
OPERATOR 











JERK 


GENERATOR 


Fic. 1. Simplified blo« 


with only an acceleration input, but the prin- 
ciple is the same for the other inputs as well.) 


k diagrams of apparatus 


formance during the early stages of training 
is represented by the solid lines; that during 


The graph shows (a) an acceleration input, the final stages is represented by the broken 


(6) an output which continuously lags the in- 
put by a fixed amount (2.0 volts), and (c) an 


lines. 
To determine the rate at which the lead or 


output where the lag continuously increases lag errors accumulated during a trial, each S’s 
(.25 volt per second). If Ss tracked without data were broken down into five equal and 


lag or lead the output would be identical with 
the input (except, of course, that the output 
would contain a variable error). 


Figure 5 shows how the lead-lag errors are 


actually plotted in this study. The broken 
line illustrates how a constant lag error would 
appear. This curve is obtained by integrating 
over time the difference between the system 
input curve and constant lag output curve in 
Fig. 4. The dotted line in Fig. 5 shows the 


situation which would result if S were to 


track the input with a lag error which con- 
stantly increased as a trial progressed. It is 
obtained by integrating over time the differ- 
ence between the system input curve in Fig. 4 


and the curve showing the output with a con- 
stantly increasing lag. If there were no lead- 


lag bias the error score would yield a curve 


coincident with the baseline. If S were to 


track the input with dead errors, the curves 
in Fig. 5 would lie below 0 and have a nega- 
tive slope. 


x——xX FAST JERK 


TO SYSTEM (VOLTS) 


INPUT 





The effects of training on the lead-lag error 
scores are shown in Fig. 6. A point lying 


above the O axis represents a lag error and 


one falling below indicates a lead error. The 
curves show the accumulation of lead-lag DURATION OF TRIAL 


error scores during the last 55 seconds of the ‘ic. 2. Inputs to system as a function of duration 


60-second trial for each type of input. Per- of trial. 





W. D. Garvey 





SLO 


FAST INPUTS 


L_] 
UA 


(ARBITRARY UNITS) 





y 
Y) 
| WH) 


EDIAN INTEGRATED ABSOLUTE ERROR 


M 





























Z 





RATE ACCEL JERK RATE ACCEL. JERK 
(EARLY OF TRAINING) 


Fic 


RATE ACCE. 


STAGES 


Performance 


consecutive intervals. The constancy of these 
error scores was tested first by obtaining the 
difference between the score at each interval 
and its preceding interval and then testing 
(Friedman, 1937) whether these differences 
progressively increased with the duration of 
the trial. If a significant total lead or lag 
error had accumulated during a trial and the 
Friedman test showed no significant progres- 
sive increase in the differences between suc- 
cessive intervals, then it was assumed that the 
lead or lag error increased at a constant rate, 
i.e., Ss tracked with a constant lead or lag 
error. If the test revealed a significant pro- 
gressive increase in the difference, then it was 
assumed that the lead or lag error scores ac- 
cumulated with at least a constant accelera- 
tion. In essence this test was made to deter- 
mine whether the curves in Fig. 6 show sig- 
nificant deviations from linearity. 

When this analysis was performed on the 
data shown in Fig. 6, the results indicated 
that during the early stages of training the 
trend of lead error scores with a slow rate in- 


(FINAL STA 


EF 


G 


2K RATE ACCEL JERK 
SES OF TRAINING) 


RATE ACCEL JERK RATE ACCEL JERK 
(WITH SECONDARY TASKS) 


as measured by absolute error. 


put was found to be constant. No reliable 
trend was found for the scores associated with 
the fast rate input. The lag error scores re- 
sulting from the slow and fast acceleration in- 
puts were found to increase at a constant 
rate. With the jerk inputs the lag error scores 
were found to increase with a constant ac- 
celeration as the duration of the trial pro- 
gressed. 

During the final stages of practice the 
trends of the lag error scores associated with 
the jerk inputs were the only ones found to 
be reliable; these were found to accumulate 
at a constant rate. 

A comparison of the training curves shown 
in Fig. 6 with the curves in Fig. 7 illustrates 
the effect of secondary tasks on performance. 
To present the results simply, the results of 
the three secondary-task conditions are com- 
bined in Fig. 7. The magnitude of the degra- 
dation following the sudden introduction of 
the tasks varies greatly, but the form of the 
curves in each case is generally the same. The 
curves in Fig. 7 are similar to those in Fig. 6 





Effect of Training on 


100 —_——— 

@—— SYSTEM INPUT 
90 O- —-O OUTPUT (WITH CONSTANT 

OF 2.0 VOLTS) 

| X---- OUTPUT (WITH CONSTANTLY 
80) INCREASING LAG OF 025 


VOLTS / SEC) 


DURATION 
Fic. 4. Relationship of lagging outputs to system 
input. (The graph shows an acceleration input and 
outputs with a constant lag and with a constantly 
increasing lag.) 


during the early stages of practice. Both rate 
inputs show significant lead errors, the ac- 
celeration inputs produce significant lag er- 
rors which are accumulated at a constant rate 
and the jerk inputs show a significant ac- 
celerated build up of lag error. Furthermore, 
from an inspection of the curves it appears 
that tracking performance under the second- 
ary tasks might be described as having re- 
gressed beyond that found during the early 
stages of practice. This speculation is based 
on observations that lead or lag error scores 
which developed during the course of trials, 
with secondary tasks, are greater than those 
developed during the trials in the early train- 
ing. Also, there is a tendency (not reliably so) 
for the build-up of the lag error, associated 
with the acceleration inputs, to be acceler- 
ated; the errors associated with the jerk in- 
puts appear to build up with a constant rate 
of change of acceleration. 


Tracking Behavior 





~——--—CONSTANT LAG-ERROR 
SCORE 

CONSTANTLY INCREAS- 
ING LAG-ERROR 
SCORE 


INTEGRATED LAG-ERROR SCORE 


(ARBITRARY UNITS) 


60 
URATION OF TRIAL (SEC) 

Fic. 5. Integrated lag error score as a function of 
duration of trial. (The graph shows how a constant 
lag error and a constantly increasing lag error will 
be plotted in the results.) 


DISCUSSION 


It should be pointed out that even though 
the lead-lag error measure need not reflect 
the overall accuracy of S’s tracking perform- 
ince, it is a more “sensitive” measure of track- 
ing behavior than the absolute error scores, 


NT 


FAST ACCELERATION 


(+) ERROR (ARBITRARY 


FAST JERK 


Lac 


R 


RATION OF TRIAL (SEC 


Fic. 6. Median integrated lead or lag error as a 
function of the duration of the trial during two 
stages of training. (The curves represent the 
results during the early stages of training—Days 2-7 

and the broken curves represent those during the 
final stages of training—Days 17-22. The top set of 
curves are for the rate inputs; the middle set, for 
acceleration inputs; and the bottom set, for jerk in- 
puts.) 


solid 





D. Garvey 





FAST RATE 


UNITS) 





ON 


ARBITRARY 


(+) ERROR 


SLOW JERK 


INTEGRATED LEAD (-) OR LAG 


MEDIAN 








+ +—_—+__ 





+ 
60 5 60 
DURATION OF TRIAL (SEC) 


Fic. 7. Median integrated lead or lag error as a 
- function of the duration of the trial during stress 
(The top set of curves are for the rate inputs; the 
middle set, for the acceleration inputs; and the bot 
tom set, for the jerk inputs.) 


DURATION OF TRIAL (SEC) 


which are a direct measure of accuracy. Not 
only did the lead-lag errors discriminate the 
inputs more accurately and systematically, 
but also the changes in this measure which 
resulted from training and adding secondary 
tasks were more orderly. These results sug- 
gest that the lead-lag error score is a measure 
of a change in behavior which underlies the 
change in the accuracy of performance as re- 
flected by the absolute error score. 
Furthermore, the lead-lag error scores make 
it possible to compare the performance of the 
human operator with a physical model which 
could be constructed or set up on an analog 
computer to perform the operator’s task. One 
of the mathematically simplest class of mecha- 
nisms to which the operator might profitably 
be compared is illustrated in Fig. 8. If this 
tracking mechanism, which consists of one 
integrator and feed-forward loop, is set up 
with properly adjusted gains and substituted 


in the place of the human in the tracking 
system used in this experiment, then the sys- 
tem will track a constant rate input with 
zero lag error, a constant acceleration with a 
constant lag error, and a jerk input with a 
constantly increasing lag error. If the com- 
plexity of the mechanism is increased to one 
containing two integrators, then the system 
will track the rate and acceleration inputs 
with zero error and the jerk input with a con- 
stant lag error. 

If we look at Fig. 6 to determine what types 
of mechanisms can be substituted to repro- 
duce performance analogous to that obtained 
with the human operator, we see that at the 
beginning of training a one-integrator mecha- 
nism would be the appropriate substitute. 
This is attested by the fact that during early 
training Ss’ performance (a) did not show 
significant lag error scores for the rate in- 
puts, (6) showed constant lag error scores for 
the acceleration inputs, and (c) showed in- 
creasing lag error scores with jerk inputs. At 
the end of training Ss performed analogously 
to a two-integrator mechanism, for the data 
show that reliable lag error scores were ob- 
tained at this stage only with the jerk inputs 
Thus, the improvement in accuracy of per- 
formance with training is accomplished by a 
change in S’s behavior which is analogous to 
a change from a one-integrator to a two-inte- 
grator mechanism in the system. 

From Fig. 7 it may be seen that the de- 
terioration in S’s performance is reflected in 
the model as a reduction in its mathematical 
complexity; the form of the reduction being 
the dropping out of an integrator. This con- 
clusion is based on the fact that Ss’ perform- 
ance showed constant lag errors for the ac- 
celeration inputs and increasing lag errors for 


teh oe 


Fic. 8. An example of the mathematically simplest 
mechanism which will minimize tracking error in 
the system shown in Fig. 1 when the input to the 
system is a constant rate. (The system is shown to 





consist of one integrator with a feed-forward around 
it. The circles containing Xs represent differentials 
which add algebraically. The triangles represent am 
plifiers with appropriate gain settings.) 





Effect of Training on Tracking Behavior 


the jerk inputs.* This type of performance is 
equivalent in kind to that found during the 
early stages of practice. Thus, it would ap- 
pear that the addition of secondary tasks 
with highly trained Ss produces a regression 
in performance to a level of complexity analo- 
gous to that found in the early stages of train- 
ing. Moreover, a closer comparison of lag 
error scores under the secondary tasks with 
those during early training leads to the specu- 
lation that performance may have deterio- 
rated to a stage, which, in terms of the model, 
is less complex than that found during initial 
learning. 

In general the results of the study indicate 
that the type of mechanism which may be 
substituted to provide performance analogous 
to the human operator differs as a function of 
the amount of practice S has had with the 
system. At the beginning of training S per- 
forms analogously to a one-integrator system. 
As training progresses, S picks up an inte- 
grator, so to speak, until he is performing 
analogously to a two-integrator system. When 
secondary tasks are added, however, the sec- 
ond integrator drops out, leaving the perform- 
ance characteristics of the man analogous to 
a one-integrator system. 


>The significant lead error scores for the rate in- 
puts are not explained by the model. However, previ 
ous research with similar systems and with similar 
experimental designs suggests strongly the possibil 


ity that there might have been an interaction effect 
among the input conditions, ie., since each S had 
equal training with all conditions, he may have 
learned to apply a “correction factor” in order to 
keep up with the movement of the target. If such a 
factor assumed some approximation of the average 
of the corrections needed to keep from lagging on 
all course inputs, then lead error scores would de 
velop for the rate inputs even if Ss were performing 
analogously to the mechanism 
ducing the error to zero 


appropriate for re 


SUMMARY 


Using a compensatory tracking system the 
performance of a man-machine system is ana- 
lyzed in terms of absolute error and of lead 
or lag error scores to compare the effects of 
training and secondary tasks on these meas- 
ures. An analogy is made between the per- 
formance of the human operator with that of 
the mathematically simplest mechanism which 
might be substituted in the system to per- 
form the operator’s task with minimum error. 

The results of the study indicate that the 
type of mechanism which might be substi- 
tuted to provide performance analogous to 
that of the human differs as a function of the 
amount of practice S has had with the sys- 
tem. In general, it was found that at the 
beginning of practice S performs analogously 
to a one-integrator system with feed-forward 
loop. As practice progresses, S in effect picks 
up an integrator, until he is performing analo- 
gously to a two-integrator system. With the 
addition of secondary tasks an integrator is 
dropped out and performance becomes analo- 
gous to the one-integrator system of the un- 
trained S. 


REFERENCES 


FRIEDMAN, M. Use of ranks to avoid the assumption 
of normality implicit in the analysis of variance 
J. Amer. Statist. Ass., 1937, 32, 675-701 

Garvey, W. D., & Mitnick, L. L 
tracking behavicr in terms of lead-lag errors. J 
exp. Psychol., 1957, 53, 372-378 

Witcoxon, F rapid approximate 
procedures. Stamford, Conn.: 
Co., 1949. 


An analysis of 


Some statistical 


American Cyanamid 


(Received August 27, 


1959) 





Journal of Applied Psychology 
1960, Vol. 44, No. 6, 376-380 


MULTIFINGER TAPPING PERFORMANCE AS A 
FUNCTION OF THE DIRECTION OF 
TAPPING MOVEMENTS 


LYLE R. CREAMER ! anp DON A 


TRUMBO 


Kansas State University 


A number of office and data-processing ma- 
chines require a multiple-finger keying opera- 
tion. Although research on typewriters has 
shown a number of ways in which the design 
of these machines could be improved (Biegel, 
1934, 1935; Dvorak, 1936; Strong, 1956), 
most of the changes suggested by these studies 
would involve extensive relearning time and, 
as a result, have not been implemented. The 
direction of the finger movements apparently 
has not been explored, prior research having 
been concerned primarily with the order and 
spacing of the keys. 

The more basic studies on tapping have 
typically measured rate of single-finger tap- 
ping only, and have explored individual dif- 
ferences, preferred tapping rates, and ampli- 
tude of tapping movements. Task variables, 
such as direction of finger movement, have 
not been explored and were not obvious, since 
the task usually involved tapping on a sta- 
tionary surface or with a single key. Ream 
(1922) reported some incidental evidence for 
the superiority of horizontal tapping move- 
ments over vertical movements, but presented 
no data. Chapanis, Garner, and Morgan 
(1949) concluded that horizontal tapping 
movements were faster and less fatiguing than 
vertical movements. The latter authors at- 
tribute alterations in telegraph keys to this 
fact, although no research is cited. No re- 
search on tapping performance for positions 
intermediate between those requiring vertical 
and horizontal finger movements was found. 

It appeared that a systematic study of the 
relationship between multiple-finger tapping 
performance and the direction of tapping 
movements would yield results with implica- 
tions both for the response mechanisms in 
volved and for the design of equipment which 


requires these responses. 


1 Now at the University of Illinois 


METHOD 


Apparatus. The ,apparatus consisted of a Wood- 
stock typewriter with all working parts removed 
except the eight keys of the starting position (A, S, 
D, F, J, K, L, ;). Letters and striking mechanism 
were removed so that the only key connection was 
the hinge at the rear of the key. The typewriter was 
cut through the center from front to back, dividing 
the keyboard into left-hand and right-hand halves 
These halves were then hinged at the bottom so that 
the angle between them could be varied from 180 
to 0°. Thus, each half could be adjusted to any po- 
sition from the normal, with vertical tapping move- 
ments, to the vertical position, with horizontal tap- 
ping movements in opposing directions. A micro- 
switch was placed under each key and wired in the 
normally-open position. Switches were adjusted so 
that the keys had the following stroke lengths: little 
fingers, 4/16”; ring fingers, 5/16”; middle fingers, 
6/16”; and index fingers, 7/16”. Switches were set to 
close at two-thirds of the key stroke; 4 oz. of pres- 
sure were required to depress each key. A steel bar, 
connecting the two halves of the apparatus at the 
back, could be adjusted to lock the halves at five 
specific positions. Considering the angle between the 
plane of either right-hand or Ieft-hand keyboard and 
the normal (horizontal) plane, these positions were: 
0°, 22°, 44°, 66°, and 88°. Each microswitch was 
connected to one pen of an Esterline Angus Graphic 
Recorder. The keys were wired to the pens in the 
order of the sequence of taps (see below) which con- 
stituted the task for the Ss. Errors in the sequence 
were easily identified as deviations from a diagonal 
of pips across the recording paper. 

Subjects. Five male nontypists were recruited as 
Ss from a general psychology class. None of the Ss 
had received any previous training in typing and 
typing speed was reported to be less than 25 words 
per min. Ss were paid for their participation in the 
experiment 

Procedure. The keyboard was placed on a desk 31” 
above the floor. Ss were seated in a straightback 
chair and allowed to adjust their position until they 
felt comfortable. 

The performance of each S was recorded for 20 
consecutive days. Each day constituted a session and 
included a 3-min. trial at each of the five keyboard 
angles. Preceding the first session, the apparatus was 
demonstrated and the task presented to S. The task 
was the same for all Ss at all keyboard positions 
throughout the 20 sessions. S was to start tapping 


376 





Multifinger Tapping Performance 377 


PER TRIAL 


TAPS 
© 
™N 
w 


MEAN 








77 44° 6 6° 8 8° 
KEYBOARD ANGLE 


Fr 1. Rate of tapping at five keyboard positions 


with his little finger, left hand; then with the little 
finger, right hand; then ring left; ring right; mid- 
middle right; index left; index right; little 
S was instructed to continue this sequence 
throughout all test periods 

Ss were informed that errors would be scored when 
they tapped two keys simultaneously or when they 


dle left; 


left; etc 


failed to alternate in the proper sequence. When an 
error had been made S was to start the alternating 
pattern on the next tap, or he could start the se 
beginning again with the left little 
Ss were instructed to tap “as fast and accu- 
rately as possible + 


quence over, 


finger 


On the first day, following the demonstration and 
instructions, each S received a 1-min. practice trial 
at each of the five keyboard positions in the order 
in which he was to receive his first session .of test 
trials. After a 2-min. rest following the practice trials 
S was given 3-min. test trials at each of the keyboard 
positions, with 2-min. rest periods interposed. This 
constituted the first of 20 test sessions. In each suc- 
ceeding test session, S was given a 2-min. warm-up 
in the position *e was to receive first in the test 
‘ollowing a 2-min. rest, S repeated the sequence of 
-min. tests in each of the five positions in the order 
determined by the latjn square design described 
below 

Scoring. Three scores were obtained from the con- 
tinuous record. Rate of tapping was a simple fre 
quency count, over time, of the number of pips on 
the recording paper for the 3-min. test period. Two 
or more pips at the same point were scored as a 
single tap. An error was scored whenever two or 
more keys were depressed simultaneously, or when S 
failed to alternate either fingers or hands. Reversing 
the alternation (R-L, instead of L-R) was scored as 
one error, but recorded 
during the R-L pattern as long as the alternation 


additional errors were not 


was maintained. Errors were scored for Block I 
(Sessions 1-5) only, since they constituted less than 
1% of the responses in subsequent sessions. 

The third performance measure was obtained by 
determining the slope of the curve formed by the 
means of the four successive 45-sec. intervals of each 
trial. This score was used as a performance-decrement 
index of fatigue. 

Experimental design. The experimental design was 
basically a 5 X 5 latin square, replicated each day for 
20 days with the same five Ss. The latin square was 
systematic with a particular treatment always pre- 
ceded and followed by the same two treatments. The 
order for Row 1 of the square (22°, 0°, 88°, 44°, 
66°) was determined by assigning numbers to the 
treatments, and noting the order of their appearance 
in a table of random numbers. In each subsequent 
row the order was shifted one place to the left. Ss 
were assigned to rows in the order of their appear- 
ance for the first session. In the next four sessions, 
S$ received each of the remaining four sequences in 
order, moving down one row in the latin square 
each day. Thus, each S received each sequence once 
in each of the 


This design permitted an analysis of variance where 


four blocks of five sessions 
the main effects of keyboard angles, ordinal positions, 
days (latin squares), and sequences, as well as the 
interaction effects of days with angles, ordinal posi 
tions, and subjects could be evaluated. With the ex- 
ception of which with the 
subjects interaction, the 


sequences, was tested 
residual from the days 
error term, for testing each source of variance was 


the pooled residual within latin squares. 


TABLE 1 
ANALYSIS OF VAI 


IANCE FO! 


BLocK 


4.40** 
Ordinal Positions ? 1.83 
Subjects ».97 147.22** 


Days (latin squa: 208.98** 


Interactions 
DXA 27.37 
D X OP 41.92 0.89 
DxXS 916.00 19.42** 
Sequences 109.01 1.27 
85.80 


0.58 


Residual 


Pooled i 


ual Vill 
latin square 


47.17 


Total 123" 


« df corrected for est iting the value 
>» < 01 





Lyle R. Creamer and Don A. Trumbo 


RESULTS 


Data were analyzed for Blocks I and IV to 
indicate initial performance and performance 
after Ss appeared to be reaching an upper 
limit of both speed and accuracy. Examina- 
tion of the data for Blocks II and III yielded 
functions similar to, and intermediate be- 
tween, those for Blocks I and IV (Fig. 1). 

Figure 1 summarizes the results for rate 
of tapping. For both Block I and Block IV 
trials, mean taps per 3-minute trial are plotted 
against the five keyboard positions. Each 
point on each curve is based on five trials for 
each of five Ss, or 25 trials. It may be noted 
that mean number of taps per trial varied 
systematically with the direction of tapping 
movements. The standard “zero” (horizontal 
keyboard) position yielded the poorest per- 
formance both in Block I and, after 15 ses- 
sions of practice, in Block IV. The optimal 
angle for Block I was 66°, but by Block IV 
trials the optimal angle had shifted to 44 
Performance at the 88° position (nearly verti- 
cal keyboards) was relatively high in Block I, 
but declined relative to the other positions by 
Block IV. Rate of tapping at this and all 
other positions continued to be higher than 
for the zero positien, however. Finally, differ- 
ences between the two curves indicate a rather 


TABLE 2 


VARIANCE FOR RATE OF TAPPING, 


stock IV 


ANALYSIS OF 


Source 


Angles 
Ordinal Positions 
Subjects 
Days (latin squares 
Interactions: 
DXA 
Dx OP R6 
DxS5S $.61** 
Sequences 7 71 
Residual 
Pooled residual within 


latin squares 


Total 


45 


40 


35 


30 


25 


MEAN ERRORS PER TRIAL 





4 s a 4 a 


0° + 





44° 6 6° 8 8° 


KEYBOARD ANGLE 


Fic. 2. Errors at five keyboard positions 


large increase in overall performance from 
Block I to Block IV. 

Summaries of the analysis of variance by 
blocks for these data are presented in Tables 
1 and 2. In both cases, the differences among 
keyboard positions were significant () - 


.O1). 
The Student-Newman-Keuls gap test (New- 
man, 1939) failed to reject the hypothesis of 
no difference between any pair of means for 
either Block I or Block IV data. The highly 
significant F value for days (latin squares) 
reflects the large increase in rate of tapping, 
especially for Block I data, attributable to 
practice. Extensive individual differences, both 
in mean rate of tapping and in rate of im- 
provement, are indicated by the 
F values for subjects and for days X subjects 
interaction, respectively. That these sources 
of variance continue to be significant for 
Block IV, after 15 sessions, is evidence that 
Ss had not yet reached an upper limit of 
speed of tapping. 


significant 


Data for mean number of errors as a func- 
tion of keyboard angle are presented in Fig. 2 
for Block I. Results of the analysis of vari- 
ance for error scores (Table 3) indicate a 
significant effect for keyboard angles with 
p< .01. The effect of practice in reducing 
the number of errors is indicated by the sig- 
nificant F for days. As in the case of rate of 
tapping, individual differences both in num- 





Multifinger Tapping Performance 


TABLE 3 


ANALYSIS OF VARIANCE FOR Errors, BLock I 


Source df MS 


Angles 1,148.78 
514.21 
3,478.59 


1,424.77 


Ordinal Positions 
Subjects 
Days (latin squares 
Interactions: 
DXA 
D xX OP 299.12 
DxS 1,307.57 
Sequences - 85 


214.55 


, "5 


2.77 
Residual 1,459.17 
Pooled residual within 


latin squares 
Total 


8 df corrected for est 


>» < .O1. 


ber of errors and in rate of improvement are 
evident in the significant F values for sub- 
jects and days subjects interaction. Per- 
formance accuracy was best at the 88° po- 
sition and poorest at the 0° position. The 
greatest absolute difference among adjacent 
keyboard positions occurred between 0 


2? 


and 
. None of the gaps was significant, how- 
ever. 

Performance decrement. Rate of tapping 
during the 3-minute trials decreased with each 
successive 45-second interval. A comparison 
of the mean number of taps for these intervals 
indicated the decrement to be roughly linear 
for each of the keyboard positions. It was as- 
sumed that the decrement was a linear func- 
tion of time, and the slope of this function 
was computed for each keyboard angle for 
Block I. These slope scores were used as an 
index of response produced decrement, or 
fatigue. The variance attributable to key- 
board angles yielded an F of 1.11, which, 
with 4 and 60 degrees of freedom, was not 
significant. 

Effects of practice. Increases in tapping 
rate for the 20 sessions resulted in a relatively 
smooth, negatively accelerated curve. Mean 
number of taps increased by more than 100% 
from Session 1 to Session 20. Rate of tapping 
appeared to have been rapidly approaching an 
upper limit by Block IV sessions 


DISCUSSION 


Although the optimal angle changed from 
Block I to Block IV sessions, the absolute 
difference between the best and the poorest 
positions remained approximately the same, 
that is, the advantage of the intermediate 
angles was not proportional to the perform- 
ance level, indicating that both practiced and 
unpracticed Ss show about the same absolute 
gains from these angles for the task used in 
this study. 

Possible negative transfer effects or rela- 
tively permanent advantages for the practiced 
typist at the zero position seem unlikely, 
since the responses are basically unchanged 
at the other keyboard positions. Changing the 
keyboard angle involves only changes in the 
direction of tapping movements which should 
not interfere with the S-R relationships. 

The significant days (latin squares) effects 
which appeared in each analysis of variance 
simply reflect increases in speed and decreases 
in errors as a function of the amount of prac- 
tice. Similarly, the highly significant F values 
for subjects and the days subjects interac- 
tion demonstrate expected differences among 
Ss, both in performance level and in rate of 
improvement. 

Greater differences were anticipated among 
the experimental conditions on the index of 
fatigue. The performance-decrement (‘‘slope”’) 
scores were nearly identical for the five key- 
board angles, however. Evidence that the Ss’ 
slope scores may have been insensitive to a 
greater effort expenditure at the angles result- 
ing in slower rates of tapping came from the 
Ss’ verbal reports. Following the final session 
each S was interviewed about his performance 
and his keyboard preferences. All five Ss re- 
ported the zero position to be the most fa- 
tiguing; four of the five indicated that they 
worked harder at the zero position, and pref- 
erences were split between the 44° and 66 
positions. These reports suggest that Ss may 
have been able to compensate for the greater 
effort required at the zero position for the 
brief 3-minute trials. Further research using 
longer trials would be predicted to reveal 
greater performance decrements for those po- 
sitions yielding the tapping rate. 
Electromyographic measures of muscular ac- 


poorest 





380 


tivity such as those recently made with typists 
(Lundervold, 1958) might also be expected 
to reveal greater muscular activity at the less 
favorable positions. 

Probably the most significant finding of the 
study from a practical point of view was the 
relatively poor performance with the standard 
horizontal keyboard. This keyboard and the 
experimental task closely resemble the typing 
task and other multifinger tapping or keying 
operations. The implications of the study for 
the design of such keyboards are quite clear. 
Further research with typists and with addi- 
tional tasks is needed before the practical 
gains of such modifications can be quantified, 
however. 


SUMMARY 


Five male Ss were given 3-minute trials at 
each of five keyboard positions for 20 con 
secutive days. The keyboard, consisting of the 
eight keys of the starting position of a type- 
writer, was hinged in the middle, so that the 
direction of tapping movements could be 
varied from horizontal to vertical. The task 
was a simple alternation of both fingers and 


hands. Rate of tapping was found to be great- 
est at the positions intermediate between 
horizontal and vertical keyboards. Errors de- 
creased monotonically from the horizontal to 


Lyle R. Creamer and Don A. 


Trumbo 


the vertical keyboard positions, but were 
highly infrequent at all positions after 5 days 
of practice. No significant differences were 
found among the experimental conditions on 
a performance-decrement index of fatigue. 
Practical implications of the findings were 
discussed. 


REFERENCES 


Brecet, R. A. An improved 
Hum. Factor, 1934, 8, 280-285. 

Biecet, R. A. New keyboards for typewriters and 
teleprinters. CR 8th Conf. Int. Psycho-Tech., 
Prague, 1935, 222-225. 

Cuapants, A., Garner, W. R., & Morcan, C. T. Ap- 
plied experimental psychology. New York: Wiley, 
1949 

Dvorak, A., Merrick, N. I., Deatey, W. L., & Forp, 
G. C. Typewriting behavior. New York: American 
Book, 1936 

LuNDERVOLD, A. Electromyographic investigations 
during typewriting. Ergonomics, 1958, 1, 226-233 

NEWMAN, D. The distribution of range in samples 
from a normal population, expressed in terms of 
an independent estimate of standard deviation 
Biometrika, 1939, 31, 20-31 

Ream, M. J. The tapping test 
tilitv. Psychol Monogr., 1922, 
140). 

Stronc, E. P. A comparative experiment in simplified 
keyboard retraining and standard keyboard supple 

training. Washington, D. C.: General 

Services Administration, 1956 


typewriter keyboard 


A measure of mo- 
31(1, Whole No 


mentary 


(Received December 28, 1959) 





Journal of Applied Psychology 
1960, Vol. 44, No. 6, 381-385 


AN OBJECTIVE VALIDATION OF FACTUAL 
INTERVIEW DATA 


DAVID J. WEISS anp 


Industrial Relations Center, 


The interview is frequently used to obtain 
certain types of information, yet all too often 
the validity of the information obtained is not 
determined. Many researchers adopt the atti- 
tude expressed by Benney and Hughes (1956) 
as follows: 


in the research interview, at least—and we can re 
gard this as archetypal—the assumption is general 
that information is the more valid the more freely 
given. Such an assumption voluntary 
character of the interview as a relationship freely 
and willingly entered into by the respondent (p. 139) 


stresses the 


In other words, if it can be ascertained that 
the interviewee was cooperating with the in- 
terviewer, then the information gathered is, 
by assumption, valid. Without further sup- 
porting evidence, this assumption is difficult 
to accept. “Face validity” is always suspect. 
Kahn and Cannell (1957) note the extent and 
nature of the problem when they observe that 
studies 

show persistent and important differences between 
interview data and data obtained from other sources 
. .. The interview must be considered a measure- 
ment device which is fallible, and which is subject 
to substantial errors and biases (p. 179) 


It is necessary to attempt the validation of 
interview data against some external, objec- 
tive criterion, rather than on a_ subjective 
judgment of “cooperation.” 

Probably the criterion most readily avail- 
able on which to validate interview data is 
existing records of one type or another. Prob- 
lems are involved in gaining access to certain 
types of factual records, however, and this is 


1 The authors are staff members of the Vocational 
Rehabilitation Research Laboratory of the Industrial 


Relations Center, University of Minnesota. This 
study was supported, in part, by a research Special 
Project grant from the Office of Vocational Rehabili- 
tation, United States Department of Health, Educa 
tion, and Welfare. The authors express their sincere 
appreciation to George W. England and Lloyd H 
Lofquist of the center staff for their expert guidance 
during the conduct of the study and their helpful 
suggestions during the writing of this article 


RENE V. DAWIS! 


University of Minnesota 


further complicated by the fact that these 
criteria themselves are sometimes unreliable 
and/or invalid. Nevertheless, the use of ex- 
isting records as criteria for the validation of 
interview-obtained information seems gener- 
ally accepted (Hyman, Cobb, Feldman, Hart, 
& Stember, 1954; Maccoby & Maccoby, 
1954). 

Two studies, using existing records, may be 
cited for purposes of illustration. Bancroft 
(1940) compared interview responses of 1,595 
unemployed Philadelphians with relief agency 
records and found the amount of invalid re- 
sponse to vary from 8% to 75°, depending 
on the nature of the information requested. 
Hyman (1944-45) validated responses to a 
survey question on the prevalence of savings- 
bond redemption behavior and found invalid- 
ity to range from 7% to 43%, using objec- 
tive records of bond cashing as the external 
criterion. Several other studies of this type in- 
dicate results of the same order with percent- 
ages of invalidity running as high as 50% 
and sometimes even higher (Kahn & Cannell, 
1957). 

A slightly different type of validation cri- 
terion was used in a study by Keating, Pater- 
Stone (1950). These investigators 
reported that interview information obtained 
from unemployed Employment Service appli- 
cants correlated from .90 to .98 with informa- 
tion furnished by their former employers for 
questions on wages, duration of job, and job 
duties. 


son, and 


The purpose of the present study was to 
determine the validity of certain types of in- 
formation obtained through a survey-type in- 
terview. The present study was part of a pre- 
liminary inquiry into the employment prob- 
lems of the physically handicapped, conducted 
at the Industrial Relations Center (IRC) of 
the University of Minnesota. This study was 
designed as part of a larger effort aimed 
at investigating some of the methodological 





David J. Weiss and Rene V. Dawis 


TABLE 1 


COMPARISON OF VALIDITY OF INTERVIEW INFORMATION FOR 
HANDICAPPED PERSONS AND ADULT RELATIVES 


DVR Counselees ES Applicants 
Handicapped Adult Handicapped Adult 
Persons Relatives Persons Relatives 
% with % with % with % with 
Invalid Invalid Invalid Invalid 
Item* N Information N Information WN Information N Information 
Age 


sex 


i] 
nN 


10 
0 
Marital status 0 


Education 


mM DN KN bo 
Nm Nm hm NM 


Veteran status 

Nature of 
disability 

Age at 
disablement 

Received 
assistance 

Job title 

Job duties 

Hours 

Pay 

Length of 


employment 


* First 8 items validated ag 
b Information not availabl 


problems involved in research with the physi- Procedure. Interviews were conducted by five fe 
cally handicapped.’ male interviewers who were experi need in survey 
. type interviewing by virtue of their employment by 
an opinion polling agency. Each interviewer was 
METHOD given a 3-hour training period, explicit printed in 
Sample. The sample used in this study totaled 91 structions, and a list of 40 addresses Intervic wers 
physically handicapped individuals, 48 of whom were “TE carefully supervised by an IRC ‘Stall member 
Division of Vocational Rehabilitation (DVR) coun- Of the 91 interviews obtained, the information was 
selees, and 43 of whom were Employment Servic furnished by the handicapped person himself in 39 
(ES) applicants. All individuals were from the Min- ‘™terviews and by an adult Telativi or the handi 
neapolis-St. Paul metropolitan area. The origina! cappr d indi idual in the remaining 52 interviews The 
sample, totaling 200, was drawn randomly from a ‘Mterview was Toc used sins obtaining factual material 
list of known physically handicapped individuals relating to the handi apped individual’s disability and 
(DVR counselees and ES applicants). For reasons employment history 
such as change of residence, lack of interviewer con Validation Criteria. Two types of validation cri 
tact with residents of the household, refusals, and etia were used. For personal data and information 
failure to identify the handicapped individual as a relating to the individual’s disability, the validation 
member of-the household. interviews were not ob criteria were data from agency records. The specific 
tained for over half of this number, leaving a final items from agency records which were used for vali- 
sample of 91. A comparison of the final and the dation were: age, sex, marital status, education, vet- 
original samples showed that the sample of 91 did ©*4m Status, nature of the disability, age at disable- 
not differ significantly from the original random sam ment, and having received assistance from the agency 
ple of 200 in terms of age, sex, education, and broad The second type of validation criteria were items 
disability classification. of information supplied by employers. Of the 91 
handicapped individuals, 60 reported being employed 
2For a discussion of these methodological prob Names of employers were obtained for 52 of thes¢ 
lems, see Dawis, Hakes, England, & Lofquist (1958 0 individuals. Each employer was sent a brief ques 





Validation of Factual Interviex 


tionnaire asking for the 
employee concerned: job title, 
and length of employment. All 
were returned by the employers. However, records 
for two individuals were not found in the employers’ 
files. Thus, a total of 50 usable sets of employer- 
furnished information 
purposes. 


following information on the 
job duties, hours, pay, 
52 questionnaires 


was obtained for validation 


RESULTS 


As a first step in the analysis, 
item comparison of the validity of informa- 
tion obtained from adult relatives of the 
handicapped persons with that obtained from 
handicapped persons themselves was con- 
ducted to determine the feasibility of com- 
bining information obtained from these two 
sources. The comparison was carried out sepa- 
rately for each item within each subsample 
(DVR and ES). Validation results were 
pressed in terms of percentage of group for 
which invalid information was obtained. Chi 
square analysis and the Fisher exact prob- 
ability test were used in the comparisons. 
Table 1 summarizes the data on these com- 
parisons. 

No significant differences in validity were 
found information obtained 


an item-by- 


ex- 


between from 


Data 383 
adult relatives of handicapped persons and 
that obtained from the handicapped persons 
themselves. Comparison of handicapped indi- 
viduals from whom information was obtained 
directly with those for whom information was 
obtained from adult relatives (undertaken 
separately for DVR and ES subsamples) re- 
vealed no significant differences in sex, age, 
marital status, education, and broad disabil- 
ity grouping. On the basis of these results, 
data obtained from the two sources were com- 
bined for each item, but separately for DVR 
and ES subsamples, for purposes of further 
analysis. 

The feasibility of combining the data for 
the two subsamples (DVR and ES) was de- 
termined next. DVR and ES subsamples were 
compared on each of 13 items to see if any 
differences existed in the validity of the in- 
formation obtained. Table 2 shows these com- 
parisons. 

Table 2 


invalid 


indicates that the percentage of 


information for DVR and ES sub- 


samples differed significantly on two items: 
marital status and amount of education. On 
both items more invalidity was found for the 


rABLE 2 


COMPARISON OF VA INTERVIEW 


IDITY OF 


DVR Counselees 


Qo with 
Invalid 


Item* Informatior 


\ge 
seCX 
Marital status 
Education 
Veteran status 
Nature of 
disability 
> at dis 
iblement 
Received 
assistance 
Job title 
Job duties 
Hours 
Pay 
Length of 
employment 30 


* First 8 items validated against 
b Information not available for 


INFORMATION 


FoR DVR ES SuBsAMPLi 


AND 


ES Applicants Total Sample 


% with % with 
Invalid 
Information 


Invalid 
Informatior 


14 





384 


DVR subsample than for the ES subsample. 
However, invalidity in these items might be 
spurious, i.e., interview data could have re 
flected actual changes in marital status and 
educational attainment. On one other item, 
length of employment, validity differences 
(between the subsamples) approached the 
.05 level of significance. On the whole, sub- 
sample differences on all three items did not 
appear to have any practical significance. 

Median invalidity for the total sample 
across all items was 16%, with a range from 
0% to 55°. The least amount of invalidity 
(0%) occurred on the item referring to the 
sex of the handicapped person. A relatively 
low amount of invalidity (10%) was found 
on the question referring to the nature of the 
disability. The highest proportion of invalid- 
ity (55°) was obtained on the question of 
having received assistance from the agency. 
In addition, relatively high percentages of in- 
valid information were obtained on length of 
employment, age at disablement, and educa- 
tion items for the DVR subsample; and on 
pay and job title items for the ES subsample. 

Interview data on items pertinent to em- 
ployment (validated against employer-fur- 
nished information) had a median invalidity 
of 22%, with a range from 10% to 29%. In 
contrast, data on personal history items (vali- 
dated against agency records) had a median 
invalidity of only 12.5%, but with a range 
from 0% to 55%. These findings are consist- 
ent with results of other studies (cf. Ban- 
croft, 1940; Keating et al., 1950). 


SUMMARY AND CONCLUSIONS 


Interviews were used to obtain information 
on the personal characteristics and employ- 
ment of 48 DVR counselees and 43 ES ap- 
plicants. Seventeen DVR counselees and 22 
ES applicants furnished the information them- 
selves, while adult relatives furnished the in- 
formation for the rest of the sample. This 
information was validated against agency rec- 
ords (for personal history items) and infor- 
mation furnished by the employer (for em- 
ployment-related items). Results of the vali- 
dation may be summarized as follows: 

1. Information obtained from relatives of 
handicapped persons was found to be as 
valid as information obtained from handi- 


David J. Weiss and Rene V. Dawis 


capped individuals themselves. That is, sec- 
ondary sources can be as trustworthy as pri- 
mary sources. This conclusion, however, is 
limited to the types of information obtained 
for this study. 

2. Slight differencgs were observed in the 

validity of interview information for different 
interviewee groups (in this study, DVR coun- 
selees and ES applicants). These differences, 
however, were not practically significant. 
3. Validity varied with the type of infor- 
mation. In this study, validity ranged from 
100% (for the question on the sex of the 
subject) to less than 50% (for the question 
on whether the subject had received any. as- 
sistance from the agency). This finding em- 
phasizes the necessity of validating each in- 
terview question which purports to elicit 
factual information. It also raises questions 
about the research value of interview data 
which is based on nothing more than “face 
validity,” i.e., rapport between interviewer 
and interviewee 

One other conclusion to be drawn from the 
last-mentioned finding tends to support previ- 
ous research. Several investigators (Hyman 
et al., 1954; Maccoby & Maccoby, 1954; 
Parry & Crossley, 1950-51) have indicated 
that relative levels of invalidity in interview 
data seem to follow pressures of “social de- 
sirability.” The largest amount of invalidity 
in the present study was obtained for the 
question on whether the handicapped indi- 
vidual had ever received assistance from any 
agency. Relatively large amounts of invalidity 
were obtained for questions on length of em- 
ployment, job title, and pay. It is conceivable 
that responses to these questions are influ- 
enced to some extent by considerations of so- 


cial desirability and ego involvement, espe- 


cially response to the question on receiving 
agency assistance. 

With findings such as those reported above, 
as well as those reported in other studies, it 
is indefensible to assume the validity of pur- 
portedly factual data obtained by interview. 


REFERENCES 


BancroFrT, GertTRUDE. Consistency of informatior 
from records and interviews. J. Amer. Statist. Ass., 


1940, 35, 377-381 





Validation of Factual Interview Data 385 


Benney, M., & HuGues, E. C. Of sociology and the terviewing: Theory, technique, and cases. New 
interview: Editorial preface. Amer. J. Sociol., 1956, York: Wiley, 1957. 
62, 137-142 


KEATING, EvizasetH, Paterson, D. G., & Srone, 
Dawis, R. V., Hakes, D. T., Encranp, G. W., & 


C. H. Validity of work history obtained by inter 
Lorguist, L. H. Minnesota studies in vocational view. J. appl. Psychol., 1950, 34, 6-11. 
rehabilitation: V. Methodological problems in re- Maccosy, Eveanor, & Maccosy, N. The interview: 


habilitation research. Jndustr. Relat. Cent. Bull., A tool of social science. In G. Lindzey (Ed.) 
1958, No. 25 ' 


Handbook of social psychology. Vol. I. Cambridge 
Hyman, H. H. Do they tell the truth? Publ. opin Mass.: Addison-Wesley, 1954. Pp. 449-487. 
Quart., 1944-45, 8, 557-559. Parry, H. J., & Crosstey, Heten M. Validity of re- 
Hyman, H. H., Coss, W. J., Ferpman, J. J., Hart, sponses to survey questions. Publ. opin. Quart., 
C. W., & Stempber, C. H. Interviewing in social re- 1950-51, 14, 61 80. 
search. Chicago: Univer. Chicago Press, 1954. 


Kann, R. L., & Cannett, C. F. The dynamics of in- Received January 6, 1960) 





Journal of Applied 
1960, Vol. 44, Ne 


MOTIVATION IN MANAGEMENT: 


A STUDY OF FOUR MANAGERIAL LEVELS 


HJALMAR ROSEN anp CHARLES G. WEAVER 


University of Illinois 


As Mason Haire (1959) has pointed out, 
there is really very little known about the 
motivation of management. Aside from re- 
search reports that indicate that the higher 
the occupational level, the higher the job 
satisfaction (Centers, 1948; Herzberg, Maus- 
ner, Peterson, & Capwell, 1957), the data are 
very limited. Fortune magazine (1947) re- 
ported a comparative study of different oc- 
cupational levels from executive-professional 
to factory worker with regard to the relative 
significance of income versus security that in- 
dicated that the executive was more inter- 
ested in monetary reward and less in security 
than the factory worker. Benge (1959) and 
Herzberg et al. (1957) report that middle 


management has poor morale because they 
have to carry out decisions for which they 
have no responsibility and are held respon- 


sible for subordinates over whom they have 
no authority. Mullen (1954) reported that 
supervisors have a great need for information 
regarding their status and progress on the 
job, for a role in policy formation, and for a 
chance to present their ideas for considera- 
tion. Houser (1938) reported the following 
10 elements as being important to business 
executives: 


1. Knowing whether their work is improving or 
not 

2. Having the opportunity for fair treatment 
when bringing to tieir superior things they do not 
like about their jobs 

3. Having a fair opportunity to offer suggestions 
about their job 

4. Not receiving conflicting orders from their su 
periors 

5. Receiving adequate authority to get their sub 
ordinates to do what they want them to do 

6. Assurance that promotions will go to the best 
qualified man in the organization 

7. Being given adequate information about plans 
and policies that influence their work 

8. Not having their work interfered 
superior officer in their organization 


with by a 


9. Assurance of pay increases when deserved 


10. Getting the same pay as that for other posi 
tions in the organization of equal responsibility and 
importance 


Benge (1959) reported that supervisors in a 
cross-country survey thought the following 
were important: 


1. Opportunity to get 
to-face contact 

2. Receiving sufficient advanced information 

3. Having enough authority to carry out 
duties and responsibilities 


information from a face- 


thei: 


Henry’s classic study (1949), although less 
specific, indicated a great drive toward up- 
ward mobility among executives, and Katona 
(1951, pp. 204-210) in studying executives 
postulated an identification with organization, 
i.e., the organizational goals and the goals of 
the executives become indistinguishable. 

In the reported studies there is a tendency 
to distinguish between various levels of man- 
agement, although the 
broad and ill defined that it is difficult to dis- 
‘tinguish between them. On the surface, such 
classification seems justified. It is readily ap- 
parent that within the management class 
within a given industrial plant, let alone in 
industry at large, there is a considerable 
heterogeneity in status, potential for upward 
mobility, and specific occupational role de- 
mands. Even the supervisory function, so 
often identified with the management role, 
does not apply universally to all members of 
management, e.g., the quasi-professionals who 
make up the staff specialist category often 
have no supervisory responsibility. Yet, all 
members of management operate within the 
confines of an autocratic structure, and main- 
tenance and improvement of status is de- 
pendent upon their performance in carry- 
ing out their responsibilities—not upon their 
extra organizational affiliations such as un- 
ions. To what extent then, when analyzing 
the motivations of management, can one gen- 
eralize about the “managerial class’’? 


often classes are so 


386 





Motivation in 


This study was undertaken with the pri- 
mary purpose of determining whether or not 
motivational commonality exists among vari- 
ous levels of management with regard to what 
managers want from their jobs and the im- 
portance that they attach to various job con- 
ditions. Of secondary interest was relating the 
job desires of each class to the general find- 
ings previously reported. 

If motivational disparity is found to exist 
among differing managerial levels, then the 
general tendency to speak of “management” 
as meaningful, distinguishable classes, moti- 
vationally, is justified. If, on the other hand, 
a motivational commonality is found to exist, 
even within the relatively restricted realm of 
this study, then there is some basis for ac- 
cepting the class as a legitimate unit of study; 
and perhaps of equal importance, to speculate 
about the the motivational 
monality. 


genesis of com- 


MrTHOD 
Sample 


To eliminate interplant variance, the subjects were 
selected from a single organizational unit. The 
pany represented one plant of one of the leading and 
oldest manufacturers of farm implements and ma- 
chinery in the United States. The plant employed 
approximately 2,000 workers and was located in a 
moderately sized urban center. In conversations with 
plant and corporation personnel, it was reported that 
the plant, until a few years ago, was characterized 
by severe management incompetence in all managerial 
levels. To cope with the problem, a general manager 
installed the parent and was 
given unilateral control of the plant re decisions and 
policy formation 


com 


was by corporation 


The population studied consisted of 155 men, the 
total managerial the plant, excluding the 
general manager. For the purposes of the study, the 
men were categorized into four levels with regard to 
status and job responsibilities.' Included 
top managers, the highest group of managers within 
the plant, directly under 
middle managers, all of whom reported to top man- 
agement, including such job titles as 
ager, general foreman, accounting supervisor, et 
staff specialists, e.g., engineers, time men 
methods analysts, etc., who reported to middle man; 
agement; and 46 first line supervisors (foremen) wh« 


force of 


were seven 


the general manager; 36 


personnel man- 


65 study 


were responsible to middle management 
his 


the 


Each man regardless of level was evaluated by 
superiors in terms of how well he carried out 


1 The categorization was based upon organizational 


charts, job descriptions, and discussion 


with the general manager. 


prolonged 


Management 


TABLE 1 
Four CONDITIONS OF 
Stimuti Irems 


['WENTY Wort 


USED AS 

\rea 1. Relations with Superiors 

Having the opportunity to talk over problems with 
my superior 

Knowing whose orders to follow 

Working under a superior who explains any changes 
he makes 

Having superiors who will help me out, but not take 
over when I get into a jam 

Working under superiors who judge me solely in 
terms of merit 

Working under superiors who delegate as much of 
their authority as possible 

Knowing where I stand as far as my superiors are 
concerned 

Working under men who attempt to develop their 
subordinates 

Working under superiors who recognize the pri 

work 


Working under a man who will take over ar 


lems involved in my 


job for me when I get intoa jam 


\rea 2. Company Practices and Policies 


Having knowledge of plant plans that 
and my job 

Working in a plant where the responsibili 
every supervisor are clearly defined 

Working for a company that stresses experience 
more than education for promotion 

Wor! ing in a plant that is operated efficiently 

Working in a plant that has clear cut, long range 


objec tives 


\rea 3. Relations with Peers 
Having the other managers at 


the vork 


what others are 


importance of my) 


Having knowledge of doing in as 
much as it may affect me and n 


y 10D 


Having mutual cooperation among the managers at 


my level 


Working with fellow managers who recognize the 


problems involved in my work 


} 


Working with fellow managers who will help me out 


vhen I get into a jam 
Area 4. Opportunity for Self Expression 


plar 


Having the opportunity to share in t policy 
making decisions 

Being consulted before decisions are made which 
concern me and my department 

Having sufhcient authority for the job expected 
of me 

Having management meeting where everyone can 


have his say 





388 Hjalmar Rosen and 


In all but the staff 


responsibility was 


igned to him 
population, supervisory 


responsibilities as 
specialist 
part of the job specification. Although each man was 
judged with respect to his contribution to the or 
ganization, the 
responsibility oi 


ie., job performance overall effe 
tiveness of the plant was the sole 


the general manager. 


Method 


The data collected means of a highly 
structured questionnaire made up of 24 items roughly 
categorized into four major areas: relations with 
superior, company policies and practices, peer re 
lationships, and opportunities for self-expression 
Stimuli items that utilized are presented in 
Table 1. Job condition stimuli were selected on the 
basis of the literature and discussion with the gen 
eral manager. They were selected to emphasize only 
the desirable, nonmaterial aspects of work 


the 


were by 


were 


The stimuli were prefaced by following in- 


structional set: 

The conditions of work listed in this section 
have been given by managers as things they con 
sider to be important in their work. Obviously, 
you may not think they are of equal importance 
We would like you to tell us how important you 
think each condition is by using the following al 
ternatives. 


Six response categories were provided having a priori 
weights assigned from 4 through 0 respectively. They 
(4) Essential, (3) Very Important, (2) Quite 
Important, (1) Of Little Importance, (0) Of No Im 
and (x) Undesirable.2 The last category 
was included to provide a meaningful alternative if 
the a priori selection of items in terms of 
able continuum was incorrect. Unlike most 
alternatives, the meaning intended by the 
selection was spelled out in terms of behavioral acts 


were: 
portance, 


a desir- 
response 
response 


2The few individuals who utilized this category 
were deleted from the 
3 “Fissential” was defined as 
which you feel must be 
were not present 


analysis 

“A condition of work 
present in job If it 
ou would try to find another job 
“Very Important” was defined 
as: “A condition of work you feel is highly desirable. 
If your job did not this characteristic, 
would er looking for another job.” 
“Quite Important” was defined “A condition of 
work that you consider to be desirable. If a job did 
not have this might consider 
ing for another job but probably would not actua 
“Of Little Importance” was defined as: 

condition of work you would rather have 
have. If your job did not this 
you might complain, but you would never consider 
looking for another job.” “Of No Importance” was 
defined “A condition of work that you feel 
makes no difference to you one way or another.” 
“Undesirable” was defined as: “A condition of work 
that you do not feel should be present in your job.” 


your 
as soon as possible » 


have you 


nel no 
eriouse CONSUM 


as: 
look 


characteristic you 


| 
ly 


“A 
than not 


do so 


have characteristi 


as: 


Charles G. Weaver 


It was felt that by defining the alternatives in terms 


f actual behavior, the respondents would tend to 
realistic 


give more 


Data 


responses 
were analyzed in terms of descriptive sta 
means and standard deviations, and in 
terms of intergroup comparisons, i.e., t and 
Spearman rho’s. 


tistics, Le., 


scores 


RESULTS 
Using a crude criterion of 3.00 or higher, 
i.e., Very Important to Essential, to designate 
highly significant areas, it becomes apparent 
from the study of Table 2, no area scores 
achieve this level for the top management 
group. The following items, however, reach or 
exceed the criterion: 
1. Having the opportunity to talk over problems 
vith my superior 
2. Knowing 
11. Having 
» and my job 
2. Being consulted befor« 
rning me or my 
3. Having 


pected of me 


to follow 
plant plans that 


whose orders 


knowledge of affect 


decisions are made con 


department 


sufficient authority for the job ex 


It should be noted that no highly significant 


items dealt with relationship with peers. 
1] 
al 


Like top management, middle management 
did not have any area scores reaching or ex- 
ceeding the criterion. Of the four items con- 
sidered to be highly significant by this man- 
agement all were included top 
management as well. Item 23, relating to au- 
thority, although ranked fifth in importance 
by this group (see Table 3), approached but 
did not reach the criterion. Again, Area 3, 
relationship with peers, was not represented. 

Staff specialists considered four conditions 
of work to be highly significant. Items 1, 2, 
and 23 they shared with the two higher mana- 
gerial levels. But unlike the other two levels, 
they indicated concern with Item 7 (Know- 
ing where I stand as far as my superior is 
concerned). It should be noted, however, by 
referring to Table 3 that both top and middle 
management ranked Item 7 as being of con- 
siderable relative importance, and that staff 
specialists considered Item 22 high among 

work. Once 


their ordering of conditions of 
again Area 3 was not represented. 


group, by 


First line supervision had many more con- 
ditions of work that they considered to be 
highly significant than any other group. Items 
1, 2, 22, and 23 they shared with top and 





Motivation in Management 


rABLE 2 


MEANS AND STANDARD DEVIATIONS 


FOR Four 
Middle 


Manageme 


Top 
Management 
Mean 


SD Mean 


1.072 


) 
) 
) 
) 
) 
1 


middle management. In addition, they shared 
Item 7 with staff specialists. With regard to 
the following five items, however, they were 
unique in the importance attributed to them: 


4. Having superiors who will help out, but 
take over when I get jam 

5. Working superior 
solely in terms of my merit 

9. Working under a superior 
problems involved in my work 

12. Working in 1 where the respon 
of every supervisor are clearly defined 

14. Working ir a plant that is operated efficiently 


not 
into a 
under a 


who 


th 


gnize s 
plant sibilities 


Turning to items of minimal importance, 
i.e., item average of 1.00 or less, only two 


RE 


SD 


IMPORTANCE OF WORK 


LEVELS OF MANAGEMENT 


Staff Specialist 
Mean 


014 3.276 
954 3.230 
O09 2.615 
2.584 


166 2.876 
2.400 
3.108 
2.984 
2.815 


615 


QR? 


~ nn 


1.661 
2.892 
3.384 


2.030 


occur. Both: middle management and 
staff specialists consider Item 10 (Working 
under a man who will take over and do the 
job for me when I get into a jam) as a rela- 
tively insignificant condition of work. It 
should be noted that although top manage- 
ment and first line supervision exceed the 
the smallest 


Cases 


criterion, they do so only by 
margins. 

Turning to Table 3, from the rank differ- 
ence correlations one could quite accurately 
say that the men in the four managerial lev- 
els assess the importance of job conditions in 
much the same manner. But the question re- 





Hjalmar Rosen and Chatles G. Weaver 


rA 
Po 
MANAGI 


ORDINAIL ri Wort 


) OF CON 


Four MENT EVEL 


Item Summary Content 


lalk with superior 

Know who to follow 

Superior explains changes 
Superior helps out 

Judge by merit 

Superiors delegate authority 
Knowing how stand 

Superior develops subordinat« 
Superior recognizes problems 


Superior will take 


Knowle 


Re spor sibili 


il 


Expe rience 


Efficient plant 


Plant with long range objectives 


Peers recognize in portance ol 


n 


n 
Know 
( oopera 


) 


jo! 


vhat others are doir 


tion among peers 


Peers recognize problen 


Peers he Ip out 
Share in policy making 


Consulted on decisions affecting 


my job 


Sufficient authority 
Cc 


an have sav in meetings 


Top Management versus Middl 
Top Management ver St 
lop Management v 
Middle Managemer 


Middle Management versus Fir 


sus 


us First 


versus St 


Staff Specialists versus First Li 


mains, even though the profiles of the mana 
gerial levels with reference to the 24 condi- 


tions of work are very comparable, are the 


ous elements equivalent across groups? The 
data in Table 4 provides an answer. 

Of the 144 intergroup differences possible, 
only four, or approximately 2.89% are signifi- 
cant at the .01 level. Thirteen differences, 
approximately 9% reach or exceed the 
level. Considering that statisticians such 
McNemar (1949) indicate the desirability of 


or 
A 


5 


as 


DITIONS 


iff Spex 


BLE 3 
I's 


ORDEI 


IN R\ 


AND RAN} 


CORRELATIONS 


ank 


e M ink 
ial 
Line Su 

uff Specialists 

st Line Supervision 


ne Supervision 


using rigorous significance 


when 


the more levels 
dealing with small sample differences, 
one could propose that the differences among 
the managerial levels re importance attributed 
to various aspects of work could be accounted 
for largely by chance. One will note, how- 
ever, that of the 13 differences that reach or 
exceed the .05 level, 11 of them relate to dif- 
ferences between the first line supervisors and 
other Again it must be pointed out 
that utilizing the .01 level only three rela- 
tionships, or 4.2% of the potential, are sig- 


levels. 





Motivation in 


TABLE 4 


SIGNIFICANT DIFFERENCES BETWEEN MANAGERIA 
ATTRIBUTED 


WorK AS 


EVIDENCED BY ¢ SCORES 


LEVELS IN TERMS OF IMPORTANCI 


ro Various CONDITIONS OF 


Levels 


Staff Specialist 
Middle Manager 
Staff Specialist 
Middle Manage 
Staff Specialist 
Middle Manager 
Top Management 
Middle Manage 


stall Specia 


otal Spe 


> Middle Manage: 


Middle Manag 


> Middle Manag 


nificant. In other words, such differences 
easily could be due to chance factors. Liter- 
ally, then, there is real doubt as to whether 
or not true differences exist among the groups. 
To the extent that the differences gan be con- 
sidered meaningful, however, the ftrst line su- 
pervisors appear to be the deviant group. 
Summarizing Table 4, first line supervisors 
differ from all other groups with regard to the 
greater importance they attach to Item 13 
(Working for a company that stresses experi- 
ence more than education for promotion). 
They differ from middle management in that 
they consider the following conditions to be 
more important: 
5. Working under superiors who judge me solely 
in terms of merit 
9. Working under superiors 
problems involved in my work 
12. Working in a plant where the 
of every supervisor are clearly defined 
23. Having sufficient 
pected of me 


who recognize the 


ré sponsibilitic s 


authority for the job ex 


The first line supervisors differ from staff 
specialists in that they give greater emphasis 
to Item 4 (Having superiors who will help me 
out, but not take over, when I get into a jam), 
Item 10 (Working under a man who will take 
over and do the job for me when I get into a 


Management 391 


jam), and Item 17 (Having knowledge of 
what others are doing in as much as it may 
affect me and my job). Only with regard to 
Item 8 (Working under men who attempt to 
develop their subordinates) do the first line 
supervisors give significantly less emphasis 
than staff specialists. 


DISCUSSION 


What are the implications of these findings? 
Essentially, the relative importance attributed 
to various aspects of work are comparable to 
the findings of other studies, although role 
in policy formation and the opportunity for 
upward communication are not stressed as 
strongly by the management personnel in 
this study as has been reported in the litera- 
ture. Basically, these men, regardless of their 
status within the hierarchy, are oriented 
rather specifically toward conditions of work 
that will permit them to function effectively 
in carrying out their responsibilities. But what 
characterizes the role obligations of these men 
that may account for uniformity in this re- 
spect? Differences in job specifications, status 
and potential for-upward mobility have been 
pointed out. What are the commonalities? 
The managerial hierarchy like any other so- 
cial structure has role differentiation; each 
position serves a function for the organiza- 
tion. But each role holder contributes only a 
segment of the necessary conditions that will 
lead to organizational effectiveness whether 
he be a foreman or a top manager. The efforts 
of any one man or managerial level cannot in- 
sure organizational success. 

Of the managers in this study, each was re- 
sponsible for a segment of behavior that con- 
tributed to organizational effectiveness. Each 
was judged in terms of how well he performed 
his organizational obligations, not in terms of 
how well the organization as a whole fared 
Unlike the rank-and-file worker, status main- 
tenance and advancement of these managers 
was dependent solely upon the performance 
of their jobs as evaluated by their superiors. 
Seniority, minimum standards, and job se- 
curity growing out of a work group orienta- 
tion such as unionism were not variables in 
their environment. Predictably, then, the man- 
agers would tend to focus upon those condi- 
tions that would maximize job effectiveness 





392 Hjalmar Rosen and 
and consequently the evaluation of their 
performance in the eyes of their superiors. 
It seems plausible that commonality existed 
within the four levels of management studied 
because the organization made equivalent de- 
mands upon them, i.e., they were judged in 
terms of their job effectiveness. Consequently, 
only those variables directly related to job 
success were given maximum weight. 

This theorizing does not parallel the writ- 
ings of Katona (1951) who has suggested an 
identification of the executive with the goals 
and welfare of the organization. If Katona’s 
contentions were berne out, one would have 
expected dominant concern with factors lead 
ing to organizational effectiveness, not job ef- 
fectiveness per se. Although of some impor- 
tance to this group, such conditions did not 
dominate their motivational patterns. It is 
possible that only at the level of plant man- 
agers and higher, or perhaps, men who are 
primarily integrators and policy makers, hav- 
ing a broad perspective and a criterion of per- 
sonal success based in organizational rather 
than job effectiveness, that the motivational 
patterns discussed in this paper would not 
apply. It is possible that the potentials for 
motivational differences within the “mana- 
gerial” class are not fully tested by studying 
a population such as that tapped in this study. 
Although a high degree of commonality ex- 
isted within the levels probed, if corporate 
level or general level executives had been in- 
cluded within the subject of this study, it is 
possible that differentiation would have been 
apparent. 

CONCLUSIONS 

Within the four levels of management 
studied, there was a high degree of com- 
monality regarding conditions of work con- 
sidered to be important. In general, the em- 
phasis appeared to relate to those factors in 
the environment that permitted the manager 
to perform his job duties effectively. It is pro- 
posed that the commonality present occurred 
in spite of status and role differentiation be- 
cause, in essence, the managers studied were 


Charles G. Weaver 


evaluated in terms of how well they carried 
out their relatively restricted responsibilities 

not in terms of the effectiveness of the over- 
all organization. It was also hypothesized that 
if differences among management levels were 
to be discovered, it would occur between man- 
agement classes responsible for organizational 
efficiency and other levels, such as those 
probed in this study, who were responsible 
only for their jobs. 

In a sense, then, perhaps one can accurately 
talk about “management” as a meaningful, 
cohesive class sharing common motivations re 
what they want from their work, if their re- 
sponsibilities are defined in terms of job 
rather than organizational effectiveness. Dis- 
tinctions, with regard to status and occupa- 
tional role, may only be important if they re- 
late to responsibilities tied to organization 
rather than job. Among managers with roles 
primarily defined in terms of job perform- 
ance, however, differences in responsibilities 
and status, although apparent, may be psy- 
chologically and motivationally unimportant. 


REFERENCES 
Bence, E. J. Morale of supervisors. Advanc 
1959, 24(3), 17-19. 
Centers, R. Motivational aspects of occupational 
stratification. J Psychol., 1948, 28, 187-218. 
Fortune. Survey 1947, 35(Jan.), 10 
Haire, M. Psychological problems relevant to busi- 
and Psychol. Bull., 1959, 56, 169 


Megmt., 


sot 

Fortune. 

ness industry 
194. 

Henry, W. E. The business executive 
dynamics of a social role. Amer. J 
28, 286-291. 

HERZBERG, F., Mausner, B., Peterson, R. 
Capwet., D. Job attitudes review 
opinion. Pittsburgh, Penn.: 
of Pittsburgh, 1957 

Houser, D. R. What people want from 
New York: McGraw-Hill, 1938 

Katona, G. Psychological analysis of economic be- 
havior. New York: McGraw-Hill, 1951. 

McNemar, Q. Psychological New 
Wiley, 1949 

Mutten, J. H. The supervisor assesses his job in 
management: Highlights of a nationwide survey 
Personnel, 1954, 31, 94-108. 


The psycho- 
Sociol., 1949, 


O., & 
of research and 
Psychological Service 
business. 


York: 


statistics. 


(Received January 25, 1960) 





Fournal of Applied Psychol 
1960, Vol. 44, N 6. 39 


FOUR SEMANTIC RATING SCALES COMPARED’ 


WILLIAM D. WELLS 


Newark College of Rutger 


University 


Users of semantic rating scales need to 
make a number of minor but troublesome de- 
cisions about scale format and administration: 
Should the scales be presented as unbroken 
lines, or should they consist of discrete steps? 
If discrete steps, how many steps should there 
be? Should the steps be defined by letters, 
numbers, words, phrases, or just left blank? 
If several objects are to be rated, should they 
be rated singly, or all together? 

Published research does not provide much 
guidance for making decisions of this kind. 
Most rating scale users appear to have 
adopted whatever form seemed reasonable 
and convenient, with little attempt at experi- 
mental trial of possible alternatives. Because 
of this lack of comparative evaluation, it is 
possible that some of our most widely used 
scale formats and procedures are less efficient 
than others which might have come into use 
instead. 

Here, evidence is presented on two issues: 
(a) Should scale steps be defined by ad- 
verbial qualifiers (for example: extremely, 
very, fairly, slightly), or just left blank? (5) 
When several objects are to' be rated on a 
number of scales, should one object be rated 
on all scales before another object is rated, 
or should all objects be rated simultaneously, 
one scale at a time? 

The argument in favor of leaving 
steps blank rests on the fact that it is ex- 
tremely difficult to find adverbial qualifiers 
which meet the requirements put upon them. 
Ideally, such qualifiers should make sense in 
any context, should keep the same meaning 
from one context to another, and should keep 
the same meaning from one segment of the 
population to another. In addition, their 
metric properties should be known, so that 
statistics requiring equal and invariant scale 
intervals can be used without violating basic 
measurement assumptions. Progress has been 


scale 


1 This study was conducted by the Research De- 


partment of Benton and Bowles, Inc. 


GEORGIANNA SMITH 


Benton and Bowles, Inc. 


made toward meeting some of these require- 
ments (Cliff, 1959), but so many unsolved 
problems remain that some rating scale users 
avoid adverbial qualifiers altogether. They 
prefer to define scale steps by numbers or 


letters, or the mere fact that the steps are in 
serial order. 

The argument in favor of using adverbial 
qualifiers acknowledges these difficulties, but 
asserts that the most commonly used alterna- 
tives hide the difficulties instead of solving 
them. In addition, there is reason to believe 
that raters with less than college education 
are so unaccustomed to thinking abstractly 
that even faulty guideposts may be better 
than no guideposts at all. 

The argument in favor of rating stimuli one 
at a time rests upon desire to avoid interac- 
tion among stimuli. The rating of a given 
stimulus, this argument goes, is liable to be 
greatly affected by the other stimuli rated 
with it, so results of simultaneous rating will 
always be relative, rather than absolute. 

The counterargument is that all ratings are 
made against some frame of reference, and 
it is better to have the frame of reference 
specified by the array of stimuli than to have 
it depend upon raters’ private assumptions. 
The counterargument further asserts that 
simultaneous rating increases discrimination 
and reduces halo effect, because it focuses 
rater attention upon differences among stimuli 
rather than upon over-all evaluation of one 
stimulus at a time. 


METHOD 


The questionnaire used in this investigation con- 
sisted of 24 eight-step rating scales (Wells, 1959) 
All the scales were bipolar, and were defined by 
words or phrases at the scale poles, in the manner 
of the Semantic Differential (Osgood, Suci, & Tan 
nenbaum, 1957) 

The ratings made by approximately 400 
housewives living in the New York City area. Raters 
were interviewed in their own homes by 11 profes- 
sional interviewers, who supervised the rating, an- 
swered questions when reported raters’ 


were 


necessary, 





William D. Wells and Georgianna Smith 


Verbal Format 


Intelligent 
Very 


Extremely Fairly Slightly 


Slightly 


Unintelligent 


Fairly Extremely 


Nonverbal Format 


Intelligent 


Fic. 1. Examples of the scale formats. 
meaning “extremely,” 


‘very,” “fairly,” and “slightly” from end to middle 


Unintelligent 


\ll raters were instructed that the categories were to be interpreted as 


The only difference between the two 


formats was the fact that the qualifying adverbs were printed under the lines in the Verbal format, and omitted 


in the Nonverbal format.) 


comments, and recorded observations of rater be 
havior. The raters were selected by the interviewers 
on the basis of socioeconomic quotas. 

The rating scales used in the study all referred to 
human attributes, like good-natured-stubborn, lazy 
ambitious, and old-fashioned—modern. The concepts 
rated were The Ideal Man, The Ideal Woman, and 
Myself 

Half the raters used Verbal scales, like the top 
scale in Figure 1. Half used Nonverbal scales, like 
the other scale illustrated in that figure 

Of the users of each kind of scale, half rated the 
concepts as Single Stimuli—that is, they rated each 
concept on every scale before going on to the next 
concept. The remaining respondents rated the con 
cepts as Multiple Stimuli—they rated all three con 
cepts on the first scale before going on to the next 
scale 

This arrangement produced four combinations of 
scale format and rating procedure: Single Stimulus 
Verbal (SS-V), Single Stimulus-Nonverbal (SS-NV), 
Multiple Stimulus-Verbal (MS-V), and Multiple 
Stimulus-Nonverbal (MS-NV). Each interviewer 
four combinations successively: SS-V in 
the first interview, SS-NV in the second interview 


used the 


etc 

Some of the interviewers exceeded their quotas 
an interview or two. A few interviews were rejected 
were incomplete. The final tally fo 
each combination of scale format and rating pro 
cedure is shown in Table 1. 

The numbers 1 through 8 were assigned to the 
eight scal twenty-fifth, fiftieth, and 
seventy-fifth percentiles were computed for all of 
the concepts on all of the scales under all of the 
conditions. The medians were used as measures of 
central tendency, and the interquartile ranges were 
used as measures of variability. These measures 


because they 


steps, and 


were 


TABLE 1 
NUMBER OF INTERVIEWS FOR EACH COMBINATION 
NV Total 


202 


195 


2 


Total 397 


TABLE 2 
PropucTt-MOMENT CORRELATIONS BETWEEN SETS OF 
MeEpIANS PRODUCED BY THE Four COMBINATIONS 


OF SCALE FORMAT AND ADMINISTRATION 


PROCEDURE 
Combinations Compared 


SS-V vs. SS-NV vs 
( oncept MS-V MS-NV 


MS-V vs 
MS-NV 


SS-V vs 


SS-NV 


Ideal Man 994 997 Q&X 989 


Ideal Womar 998 996 99? O85 


sell 980 992 969 972 


standard deviations be- 
do not require the assumption of equal 
scale intervals 


ised instead of means and 


cause they 


RESULTS 


The general question to be answered from 
the data was: What effects did the different 
combinations of scale format and rating pro- 
cedure have upon the conclusions to be drawn 
from the ratings? 

An over-all comparison of the results from 
the four combinations is shown in Table 2. 
The figures in this table are product-moment 
correlations computed by correlating the me- 
dians produced by one combination with the 
medians produced by another. These coeffi- 
cients show how predictable one set of me- 
dians is from another, when the distributions 
of medians have been adjusted for differences 
in variability and central tendency. They 
show that the relationships among the com- 
binations are very high; that gross differences 
in results are not to be expected. 

Although any one set of medians is highly 
predictable from any of the others, the re- 
sults produced by the four combinations dif- 
fered systematically in certain ways. These 





Semantic Rating Scales 


enough to make 
the best of the 
seem to be the 


differences were importan 
one combination seem to be 
four, and one combination 
poorest. 

The most noticeable systematic difference 
the greater use of 
the end categories by raters using the Non- 
verbal Because three different 
cepts were rated on 24 scales by two different 
procedures, there were 144 opportunities to 
compare a Verbal scale rating with its exact 
Nonverbal counterpart. In 133 of these 144 
comparisons, the proportion of raters using 


among combinations was 


scale Ss. con- 


the “extremely” category was larger on the 
Nonverbal scale than on the Verbal scale cor- 
responding to it. Even though the end cate- 
gories were identified in the instructions as 
meaning “extremely,” this identification ap- 
parently faded from some raters’ minds when 
the word “extremely” constantly 


present as a reminder. 


was not 


As a consequence of this greater use of the 
end categories, and of the fact that the rat- 
ings tended to be on one side or the other of 
the scale (rather than in the middle), the 
medians computed from the Nonverbal scales 
tended to be nearer the extremes than the 
medians computed from the Verbal scales. 
This “outward shift” of the Nonverbal me- 
dians occurred in 112 of the 144 Verbal—Non- 
verbal comparisons. 

Another systematic difference between the 
Verbal and Nonverbal scales was the greater 
variability in the Nonverbal ratings. In 118 
of the 144 direct comparisons, the interquar- 
tile ranges of the Nonverbal were 
greater than the interquartile ranges of the 
Verbal scales corresponding to them. The 
size of the variability difference can be seen 
in Table 3, which inter- 
quartile ranges of the four scale forms. Some 
of this variability is undoubtedly the result 
of genuine differences of opinion among the 


scales 


shows the average 


raters; but the consistently greater variabil- 
ity of the Nonverbal scale ratings is probably 
(not certainly) a symptom of greater random 
error. 

The expected difference between the Multi- 
ple Stimulus procedure and the Single Stimu- 
lus procedure was that the former would pro- 
duce interaction among the rated concepts, 


TABLE 3 


AVERAGE INTERQUARTILE RANGES IN THE Four 


COMBINATIONS OF SCALE FORMAT AND 
ADMINISTRATION PROCEDURE 


Ideal Man Ideal Woman 


MS ‘ MS 


V 1.3 1.3 1.3 
NV 1.6 1.6 1.8 


Note 
lirect « 


All V-NV 
lifference ¢ test 


while the latter would not—at least not to 
the same degree. Specifically, it seemed likely 
that the Self and Ideal ratings would be in- 
fluenced by each other—that the Self ratings 
would be more modest when made in context 
with the Ideal, and that the difference be- 
tween Self and Ideal would be greater -when 
the two concepts were rated together than 
when they were rated separately. 

In general, these expectations were fulfilled, 
although the differences among combinations 
were small. On the Nonverbal scales, the av- 
erage difference between the Self rating and 
the Ideal rating was .7 of a scale unit when 
the Multiple Stimulus procedure was used, 
and .4 of a scale unit when the concepts were 
rated singly. On the Verbal scales, the corre- 
sponding differences were .6 for Multiple 
Stimulus, and .5 for Single Stimulus rating. 
On both kinds of scales, then, the difference 
between Self and Ideal was slightly larger un- 
der Multiple Stimulus conditions than under 
Single Stimulus conditions, but the size of the 
difference was greater when the scales were 
Nonverbal than when they were Verbal. 

The expectation that the Self rating would 
be more modest under Multiple Stimulus con- 
ditions was also fulfilled, although again the 
differences were small. This expectation was 
tested by defining the side on which the Ideal 
fell as-the desirable side of each scale, and 
renumbering the scale that the 
“slightly” “step on the desirable side was 
numbered 1, the “fairly” step was numbered 


steps so 


2, the “very” step was numbered 3, and the 
“extremely” step was numbered 4. Average 
“desirability” ratings for Self and Ideal were 





William D. Wells and Georgianna Smith 


TABLE 4 
AVERAGE DESIRABILITY RATING OF IDEAL 
UNDER Four COMBINATIONS OF SCALE 
FORMAT AND RATING PROCEDURE 


AND SELI 


Self @ 
V-MS . F 
(1.04) 
NV-MS eo 
(1.30) 
RR” 
(1.08) 


ge. 44 


1.38 


Ideal Woman 
5 
(1.60 

a | 


(1.98 


-.56— 
- 68 - 


a | 
(1.61) 


— 5 


(1.82 


Slightly Fairly 


Significance tests (direct difference ¢ tests 


Self 
oD D 


V-MS vs. NV-MS 26 .11 08 4.75** 
V-SS vs. NV-SS 08 3.75** 21 .07 3.00** 
V-MS vs. V-SS 08 0.50 01 .04 0.25 

NV-MS vs. NV-SS 8 .05 1.60 16 .07 


2.36* 38 


) IQ* 


05. 
01 


then computed by 
the 24 scales. 


summing and dividing over 


On the Nonverbal scales, the average de- 
sirability rating for the Self concept was 1.3 
when the Multiple Stimulus procedure was 
used, and 1.4 when the concepts were rated 
singly. On the Verbal scales, the correspond- 
ing figure were 1.0 for Multiple Stimulus, and 
1.1 for Single Stimulus. On both types of 
scales, then, the Self ratings were slightly 
more modest under Multiple Stimulus condi- 
tions than under Single Stimulus conditions, 
when “more modest” is defined as a smaller 


2 This procedure was suggested by the observation 


that bipolar scales can be viewed as pairs of scale 
standing back to back, each member of the pair run 
ning from neutrality outward. Thus, the 24 eight 
step scales in this study could equally well be viewed 
as 48 four-step scales, with the scales in each pair 
helping to define each other. Similarly, the seven-step 
most forms of the Semantic Differ- 
ential could be viewed as pairs of three-step scales 
partially defining each other and linked by a neu- 
“don’t know” 


scales used in 


tral category for answers 


distance from the scale origin in the direction 
of the Ideal. These results are summarized in 
Table 4. 


DISCUSSION 


The correlations in Table 2 show that the 
medians produced by one combination of 
scale format and rating procedure are highly 
correlated with the medians produced by any 
of the other combinations tested. In this sense, 
the combinations are practically equivalent. 

Tn some other senses, they are not. System- 
atic differences in variability and central tend- 
ency, not reflected in the correlations, influ- 
enced the results in ways which may be im- 
portant in practical rating situations. The 
outward shift of the Nonverbal scale medians, 
for example, may present a problem to the 
investigator who wants to make statements 
like: “On the average, the Ideal Woman was 
rated as ‘extremely’ intelligent.” If the median 
rating had shifted from “very” to “extremely” 
because the end categories were overused, then 
such a statement would be a distortion of the 
raters’ intentions. 

The scale formats differed in another re- 
spect. After having used both in the field, the 
interviewers were unanimously in favor of the 
Verbal format. They reported that respond- 
ents understood the Verbal scales better, and 
stuck with them longer without prodding. 
This consideration should not be overriding, 
but it is worth taking into account. 

The differences between Single Stimulus 
and Multiple Stimulus administration were 
small when the scales were Verbal. If inter- 
action among concepts affected the ratings at 
all, its effect was too small to be statistically 
significant.® 

When the scales were Nonverbal, discrimi- 
nation between Self and Ideal greater 
under Multiple Stimulus administration than 


was 


it was under Single Stimulus administration. 


The difference in discrimination was not great, 


} Whether this finding would hold for all concepts 
is a question. Everyone has, or can formulate, opin- 
ions about the Ideal and the Self from immediate 
and personally significant experience. This is not 
generally true of nationalities, political parties, prod- 
ucts, or brands. When less personally 
being rated, int 
may be more of a problem 


significant con- 


cepts are ‘raction among concepts 





Semantic Rating Scales 


but it was statistically significant, and it is 
consistent with the idea that lack of a frame 
of reference in the Multiple Stimulus—Non- 
verbal combination encouraged raters to ac- 
centuate contrasts among stimuli, while the 
same lack of context in the Single Stimulus 
Nonverbal combination tended to pull ratings 
in toward some central value. Without the 
adverbs as a frame of reference, raters using 
the Multiple Stimulus procedure appear to 
have focused on the stimuli themselves, using 
stimulus contrast and 
paying less attention to scale position. With- 


as a guide to rating, 


out either scale-step definitions or contrasting 
concepts as a guide, raters using the Single 
Stimulus—Nonverbal procedure appear to have 
adopted the tactic of choosing one side of the 
or the other, and then making their rat- 
toward the middle of the chosen side 
had trouble making a deci- 

would have had the double 
effect of pulling the Ideal inward toward the 
Self, and pulling the Self outward toward the 


Ideal. 


scale 
ings 
whenever they 

sion. This tactic 


SUMMARY AND CONCLUSIONS 


This 
issues: 


report presented evidence on two 
Should semantic rating scale steps be 
defined by adverbial qualifiers, or just left 
blank? And should the concepts to be rated 
be presented to the rater one at a time, or all 
at once? 

Considering all the evidence, it would seem 
that the Verbal format is to be preferred to 
the Nonverbal format when the investigator 
plans to view his results in other than relative 
terms, and when the raters using the scales 
are not adept at abstractions. This conclusion 
is based on evidence from the ratings, and is 
supported by the interviewers’ reports of re- 
spondent reaction. It would apply to most 
uses of semantic scales in public opinion re- 
search. 

The between Single Stimulus and 
Multiple Stimulus administration is not so 


} 
crroice 


397 


clear-cut. When the scales were Verbal, ad 
ministration procedure had only a minor ef- 
fect on the results, probably because the quali- 
fying adverbs provided a relatively stable 
frame of reference. When the scales were 
Nonverbal, Multiple Stimulus procedure pro- 
duced somewhat better discrimination be- 
tween concepts. 

If a choice between Single Stimulus ad- 
ministration and Multiple Stimulus adminis- 
tration is to be based on present evidence, it 
would seem that Multiple Stimulus adminis- 
tration is to be preferred when discrimination 
among concepts is the principal objective and 
possible distortion resulting from interaction 
among stimuli is not likely to be troublesome. 
Uses of semantic scales in a metaphorical 
sense (for example, is “pity” straight or 
curved) would fit these specifications. 

The evidence suggests, then, that the Multi- 
ple Stimulus—Verbal combination is the best 
of the four in many opinion research set- 
with the Single Stimulus—Verbal and 
the Multiple Stimulus—Nonverbal combina- 
taking its place under certain cir- 
cumstances. The evidence also suggests that 
the Single Stimulus-Nonverbal combination 
should not be used in opinion research with- 
out careful consideration of possible alterna- 
tives, even though this is the combination 
which has been most often used in past work 
with the Semantic Differential. Whatever 
choice should be made 
with a thoughtful evaluation of the influences 
these details of format and administration 
may have upon results. 


tings, 


tions 


method is chosen, the 


REFERENCES 


CLIF! 


, N. Adverbs as multipliers. Psychol. Rev., 195° 
66, 27-44 
Oscoop, C 


E., Suc, J.. & TANNENBAUM, P. H. The 


measurement of meaning. Urbana: Univer. Illinois 
Press, 1957 


Wettis, W. D. The Rutgers Social 


tory. Chicago: Psychometric Affiliates, 1959 


Inven 


Attribute 


(Received January 25, 1960) 





Journal of Applied Psychology 
1960, Vol. 44, No. 6, 398-403 


PREDICTING RATINGS OF SALES SUCCESS WITH 
OBJECTIVE PERFORMANCE INFORMATION 


WAYNE K 


Minnesota Mining and 


Most research studies in the field of sales 
selection have not found many positive rela- 
tionships between objective measures of sales 
performance such as number of calls made 
and orders obtained and over-all ratings of 
“success” in selling for salesmen. The criterion 
problem in sales research has been an unpleas- 
ant one at best; an horrendous one, at worst. 

Cleveland’s review (1948) of sales research 
stresses the difficult problem of obtaining cri- 
teria of sales success even though there are 
literally mountains of so-called objective sales 
records available. In most instances, it is diffi- 
cult to predict from objective records who the 
better salesman might be. From the psycholo- 
gist’s viewpoint, it is extremely hard to do 
validation studies of tests in sales selection 
for the criteria are usually shaky ones. 

Reasons for this are, of course, obvious and 
plentiful. Inequality of territories, poor record 
keeping, varying amounts of time spent on 
service activities, and lack of veracity in fill- 
ing out reports by salesmen all could con- 
tribute. 

Prediction from objective records is still 
important, however. First of all, it would pro- 
vide excellent comparative data among sales- 
men and would help identify those things 
salesmen should do in order to be successful 
Secondly, it would, as indicated above, be of 
great value for test validation purposes. 

For these reasons, a research study was 
conducted in a large midwestern manufactur- 
ing company which compared various: objec- 
tive results with subjective appraisals of sales 
success for a group of salesmen. The object 
was to find out if there were positive relation- 
ships between these two kinds of data and to 
use the results in further test validation work 
in the selection of salesmen. Because the re- 
sults seemed fairly positive, the study is re- 
ported here. 

METHOD 


Objective information about job performance was 
furnished by one division of this company for 40 of 


KIRCHNER 


Manufacturing Company 


398 


their industrial salesmen over a 6-month period, 
January to June 1959. The data furnished included 
monthly totals of number of calls (both total num- 
ber made and the number of calls made), num- 
ber of orders (both total and new), demonstrations 
made, days worked, days spent with dealers, and 
the “par” set up in all these categories by the divi 
sion 


new 


The totals provided absolute measures of job 
performance while comparison of these figures against 
par provided relative measures of job performance 
In addition, various ratios of job performance wer 
computed. For example, the number of 
tained by each man was divided by the number of 
calls that he made to obtain a possibl index of sales 
effectiveness, an order-call ratio 

In all, 21 variables (absolute totals, percentage of 
par obtained, and various effectiveness 
available for this sales gr 


orders ob 


rati were 
based 

As 
were 
this 


All figures were 
on the total 6-month performance for each man 
the results will indicate, the salesmen 
highly consistent from month month 
period. This suggests that the data reliable and 
that any differences found are real differences be- 
tween the salesmen and are not arbitrary month-to 
month fluctuations. It should be noted, too, that this 
division kept careful records and insofar as possible 
tended to It was contended, in 
fact, by personnel that all territories 
were of equal sales potential 

These objective 


oup 


section 


to during 


are 


equate territories 


management 


then correlated 
with performance appraisal data for these 
men. The appraisals were made independently of this 
during the and no manager, in fact, 
knew that any comparisons would be made between 
In 
compared against the 
means correlational 


figures were sta 


tistically 
study summer 
the appraisal data and sales performance records 
all, 19 appraisal factors were 
1 factors by 
analysis 


performance S of 


Included in the appraisal factor list were an over- 
all point total based on all 


traits rated and a nu 


merical conversion of the alphabetic letter grade 


rating. Th 
tors were the regular factors foun 
In effect, the appraisal data 
“success” in selling for these men 
The 21 objective factors and the 19 appraisal fac 
are listed in Table 1 

t should be noted that information 
is available for these data for the simple reason that 
such information could not be obtained. Test-retest 
was not possible, no equivalent forms were available, 
ind odd-even or split-half was not meaningful for 
the forms. In addition, there is some doubt if ap- 
praisal data ever lend themselves to meaningful reli 
Reliability or 


given an over-all remaining 17 fac- 


1 in the appraisal 


as 


form acted as ratings 


of 


tors 


no reliability 


ability analysis consistency can be in 





l 


) 


4 


4 


5 


Qg 
10 
1] 
12 
13 


ferred, howev 


Predicting 


Number of Shop ¢ 
Number of Ne 
Calls 

Spot Orders 


New Business Or 
Demonstrations 


Days with Dealers 
Days Worke« 

Shop Calls vs 
New \ccoun 

Spot Orders vs. Par 
New Business Order 
Demonstrations vs 


Days with Dealers \ 


Days Worked \v 


Spot Orders/Sh« 


New Business 
New Account Calls 
Shop Calls Days W 


Orders 


New Acc 
Worked 
Demor 
Worked 


Demonst 


unt ¢ 


(Communicating 


Controlling 
Coordinating 


Selling \pproac h & Pres¢ 


er, by the fact that thi 


general 


Ratings of Sales Success 


TABLE 1 


ND APPRAISAL | 


ACTS 


iv erformance Factors 


Indicates total number of cz 


Indicates number of calls mz 
not old business 


Indicates total number of orders obtained 


Indicates number obtained from ne 


ot orders 
customers 


Indicates number of demonstrations of produ 


‘ 
ade by salesman 


Indicates actual numlt workin 
th dealers 


Percentage 


ites actua] number of 
of par obtained on 
Percentage of par obtained 

Pe 


Percentage of par obtained 


rcentage of par obtained 
» ? ‘ ; 
Percentage of par obtained 
Percentage of par obtained 
dealers 
> 

Per gC 


ige ol pat obtained 


icates order-call rati 


rder-call ratio fo 
new orders per n 
ber o 


S$ average nun 


average number o 


lays worked 

icates average number « 
ys worked 
ates den 


dic vonstrations-cz 
ade 


onstrations m 


percentage 


not 
Selling-Overcoming Objectio 
ling-Asking for order 
Maturity 


( ooperation 


stability 


16, Persuasiveness 
17 
18 


19 


Motivation-Drive 
Point Total 
Let 


ter Grade 


sales RESULTS AND DIscuSsSION 


manager of this group reviewed each appraisal and 


agreed with his manager's opinion 
back of the study is obvious 
be influenced in some manner by the objective data 


and by field 


does not compare two “pure’ 


reports for their men 
’ + 


sets of 


Another 
The managers had to 


The study then 
measures 


Table 2 shows the intercorrelations between 


draw- 
the two sets of variables. As is seen, there are 
62 correlation coefficients out of 399 (15.5°%) 


which equal or exceed the 5% probability 





| 
VALICT-UOTIBATIO[ 
SSQUDAISeNSIod 
uorjesode Oy) 
iInjey_-Ayy 
SUIYSY 


iqQO 


Iq Surjesjsue ad 


ulurery 


1ed A 

SIIPIGCD 

ssouisng 
MON 


IINVNYOA i SUOLOVY] ‘TVWSIVUddy 
IINVYWNNOANA SAIVS AO SAMASVAY 4 if ; { 


TTaVL 





~ 
= 
Y) 
o 
~ 
~~ 
S 
Y 


atings of 


icting R 


Pred 


a) 
doys 
sTT2D 
JUNODY 
MIN 


PION / jac [ , d ‘SA 
doys she JUNI, pIys1O. A 
SUOTI e148 suolmes3s MAN e32P10 sfeq 
uoUulsd] -uoulac] t { $13P10 
SsouIsng 
MIN 


panuiuoy)) 7 AIAVI 


epeity 19})0"] 

[eI] WwIoOg 
ALICT-UOTIVANOW! 
JUVBAISE s1dd 

1007) 

AVN PY -AdIqeis 

pag 4104 Burysy 

sUudlT VIGO UOIIVIAQ) 


jUIQeISUOUIICT 





402 


level. This is, of course, well beyond chance 
expectation and suggests that there are real 
interrelationships among the variables. 

The table also indicates that the following 
five objective variables tend to be the best 
predictors of appraisal factors: Spot Orders, 
Spot Orders vs. Par, Shop Orders/Shop Calls, 
New Business Orders, and New Business Or- 
ders vs. Par. 

In general, all of these five show positive 
relationships with all or nearly all appraisal 
factors and suggest that order production is a 
strong factor in being considered a successful 
salesman in this division. As would be ex- 
pected, these five factors also show fairly 
high intercorrelations with another as 
Table 3 points out. 


one 


From this table, two other interesting facts 
emerge also. First, the total number of or- 
obtained (Spot Orders) correlated 
highly (.95) with an order-to-call ratio (Spot 
Orders/Shop Calls). This means that the 
salesman who got the highest proportion of 
orders for calls made was also the man who 
got the greatest total business. In other words, 
the most effective man per call made ended 
up as the highest producer. Sheer volume of 
calls apparently was not enough. By making 
fewer calls but getting relatively more orders, 
a salesman achieved top production. 


ders is 


Secondly, the relationship between new busi- 
ness orders and old business (Spot Orders) 
was also fairly high (.63). This indicates that 
the person who got the old business was most 
likely to get new business as well. 


rABLI 


PropuctT—-MOMENT INTERCORREI 


MEASURES Of} 


Orders 

Spot Orders vs. Par 
Spot Orders/Shop Calls 
New Business Orders 


New 


Spot 


Bus. Orders vs. Par 


“ This figure is a part-whole correlation which correc 


t of the total Spot Orders. 


Wayne K. 


ATIONS OF Fivi 


SALES 


V 


ts for the fact that New Bu 


Kirchner 

In general, the conclusion seems obvious 
that the salesmen who were rated as the bet- 
ter salesmen in this division were actually 
those who produced more orders both in ab- 
solute and relative terms and in terms of both 
old and new business. 

Going back to Table 2, more information 
about the positive relationships among the 
variables can also be obtained. In general, the 
specific findings from this table are as follows: 

1. Certfin appraisal factors were not pre- 
dicted very well by the objective data. These 
were: Coordinating, Approach and Presenta- 
tion, Demonstrating Products, Gaining Cus- 
tomer Confidence, and Stability-Maturity. 
None of the objective data showed significant 
correlations with the above five factors. 

2. The greatest relationship (.48) between 
any two factors was between Motivation- 
Drive and New Business Orders vs. Par. Ap- 
parently those persons who achieved par or 
better in getting new business had greater 
motivation and drive in the eyes of their su- 
periors. Conversely, it could be argued that 
those with greater motivation and drive ob- 
tain more new business. 


3. Many appraisal factors were predicted 
well. For example, Volume, Quality, Com- 


municating, Developing, Asking for Order- 
Closing sales, Motivation-Drive, and Persua- 
siveness were variables that fairly 
high relationships objective 
measures. 


showed 
positive with 

4. Interestingly enough, the over-all ap- 
praisal letter grade (A, B, C, D, E) was best 


2 


5 


BEST 


PERFORMANCE 


OBI 


New 
Business 
Orders 

Par 


Spot 
Spot Orders 
irders 


s. Par Vs 


inesas Orders are included as 





Predicting Ratings 


predicted by the ratio of New Account Calls 
vs. Par. This indicated that the salesman who 
was most active in seeking new business was 
rated higher in over-all performance. 

One question obviously can‘ be raised about 
the above results: how consistent are the 
salesmen in their performance? If the top 
salesman one month in terms of calls, orders, 
and other factors ranks at the bottom next 
month, then these objective data from month 
to month cannot be relied upon to give good 
estimates of performance. Fortunately, the an- 
swer to the question is a positive one. A sta- 
tistical comparison of month-to-month figures 
over the 6-month period revealed the follow- 
ing intercorrelations among 43 salesmen for 
these variables, using the Horst method 
(1949) of computing interrater reliabilities: 
Shop Calls, .71; New Account Calls, .82; 
Spot Orders, .85; New Business Orders, .85; 
and Demonstrations, .84. 

These are highly significant figures. They 
indicate that from month to month the sales- 
men are extremely consistent in the number 
of calls they make, orders taken, and demon- 
strations made. They indicate, too, that the 
6-month totals for each man probably give 
extremely representative measures of perform- 
ance. The high intermonth relationships sug- 
that few month-to-month fluctuations 
occur. Probably they are further evidence that 
the sales territories are fairly well equated. 

From a practical standpoint, then, these 
data provide a solid objective base for fu- 


gest 


ture predictions of sales success in this di- 
vision. 

It also provides clues as to what a man 
should do in order to achieve sales success in 


of Sales Success 403 
this division. Finally, it shows that objective 
sales data can be helpful in establishing a 
good criterion of sales success and that such 
data, in other firms, should continue to be 
investigated. 


SUMMARY 


Objective sales information about job per- 
formance was gathered over a 6-month pe- 
riod for 40 salesmen of industrial equipment. 
This information which included data about 
orders obtained, calls made, “par” attained, 


and various ratios of performance was com- 
pared with sales managers’ appraisals of sales 
performance. In all, 21 objective variables 
were intercorrelated with 19 appraisal vari- 
ables. 

Results showed that there was a strong 
positive relationship among the variables. Best 
objective predictors of “success’’ were num- 
ber of total orders obtained, number of new 
orders obtained, percentage of par obtained 
on total orders, percentage of par obtained on 
new orders, and the total orders to total calls 
made ratio. Order productivity seemed to be 
the most striking factor of success. The use 
of these objective data for further criterion 
purposes appeared to be warranted and the 
investigation of these data for similar pur- 
poses in other firms is encouraged. 


REFERENCES 


CLEVELAND, E. Sales personnel research 1935 


1945: 
A review. Personnel Psychol., 1948, 1, 211-255 


Horst, P. A generalized expression for the reliability 
of measures. Psychometrika, 1949, 14, 21-31. 


(Received January 1960) 





Journal of Applied Psychology 
1960, Vol. 44, No. 6, 404-406 


MULTIDIMENSIONAL SCATTERPLOTTING: 


A GRAPHIC APPROACH TO PROFILE ANALYSIS 


BERNARD RIMLAND 


Naval Personnel Research Field Activity, 


Psychologists concerned with personnel se- 
lection have devoted much effort and atten- 
tion to the intriguing possibility that signifi- 
cant increments in validity may reward the 
researcher who discovers a way to employ 
his tests for maximum validity without re- 
stricting himself to the usual linear methods. 
Ghiselli (1956), for example, reports an in- 
vestigation in which a certain test, while not 
itself valid in the usual sense of the word, 
proved highly useful in identifying a sub- 
group of testees whose criterion performance 
could be validly predicted by means of a 
second test. 

A variety of approaches to the problems 
inherent in nonlinear combinations may be 
found in the literature. Pattern analysis, pro- 
file analysis, configural scoring, the use of 
moderator and suppressor variables, addend 
scoring, interaction analysis, and differential 
predictability are among the terms used by 
various writers to identify their approaches 
to the problem. (For an interesting review of 
some of these, see Parrish, 1959.) 

Most of the empirical studies of the prob- 
lem (including most recently the Parrish in- 
vestigation) have yielded disappointing re- 
sults. Thorndike (1949) comments that con- 
siderable AAF work in World War II failed 
to reveal convincing evidence of nonlinear re- 
lationships. Yet the possibilities are attrac- 
tive, and some positive results are on record. 
Ghiselli’s taxi driver study (1956) and a num- 
ber of investigations showing higher validities 
for women than for men in predicting college 
grades illustrate successful applications of at 
least the aspect of the problem concerned with 
differential validity for identifiable subgroups. 
Meehl (1959) has recently published an in- 


!The opinions expressed are solely those of the 
author and are in no way official; nor are they to 
be construed as representing those of the United 
States Naval Personnel Research Field Activity or 
Bureau of Personnel 


San Diego, California 

teresting example of successful profile analysis 
in the clinical area. He compared clinical 
judgment against five statistical methods of 
classifying MMPI profiles into psychotic-non- 
psychotic categories. One of the statistical 
methods (actually a systematized procedure 
combining the advantages of the clinical and 
objective approaches) proved most successful 
by achieving a “hit-rate” of 74% in a sample 
containing 47% psychotics. 

Despite the considerable attention given to 
the problem, there is still a need for a simple 
and straightforward method of exploring the 
nonlinear potentialities of a set of data. Judg- 
ing from currently published reports describ- 
ing the use of tests in applied settings, it 
seems safe to say that most psychologists do 
not consider the available profile statistics to 
be a part of their professional repertoire. 
Most of the proposed solutions present prob- 
lems of computation and interpretation tou 
demanding for extensive use, at least until 
one or more of these solutions achieve recog- 
nition as a standard statistical method. 

For some time the present writer has been 
using a simple graphic technique, easy to ap- 
ply and interpret, which lends itself to a va- 
riety of problems not commonly regarded as 
amenable to so simple and straightforward an 
approach. In addition to its usefulness in non- 
linear prediction problems, the technique has 
also proved useful in a number of other ap- 
plications, such as the presentation of com- 
plex item analysis data to facilitate test con- 
struction, or the depiction of several groups 
on the same scatterplot to facilitate the 
understanding of puzzling intercorrelational 
values. Illustrations of these applications will 
be presented below. 

Let us assume that we are interested in pre- 
dicting college grade-point average, and have 
as our data IQ scores, scores on an academic 
motivation test, and criterion grade averages. 
We could prepare a bivariate scatterplot of 





Multidimensional Scatter plotting 


IQ vs. GPA; then, instead of merely entering 
on this plot a point to indicate the location 
of each subject as is usually done, we divide 
the group into thirds on the motivation test 
and enter on the graph a code symbol, h, m, 
|, for each student to represent a high, mid- 
dle, or low position on the motivation test. 
Thus we could examine simultaneously the 
relationships among all three variables. 

If the h’s tended to fall along the regres- 
sion line, we could hypothesize that only for 
students high in motivation are IQ and GPA 
highly correlated. If the l’s were evenly dis- 
tributed along the IQ axis but concentrated 
below the mean of the GPA axis we might con 
clude that low interest results in low GPA 
irrespective of IQ. Figure 1 illustrates these 
relationships with hypothetical data. Note 
that the three intercorrelations, GPA vs. IQ, 
GPA vs. Motivation, and IQ vs. Motivation, 
would each be near zero and would not re- 
veal the relationships determinable from the 
figure. 

If the symbols were arranged in three simi- 
lar appearing layers parallel to the GPA axis 
we would know that the motivation test was 
highly related to IQ but not to GPA. Again, 
if the symbols were randomly 
the correlation 


scattered on 

} 
could 
selves the trouble of plotting or computing 


surface we save our- 
the correlations between motivation and IQ 
and between motivation and GPA, since these 
relationships must be very low. Of course the 
shape of the correlation surface itself, as de- 
termined by the positions rather than the 
shapes of the symbols, would provide us with 
the usual observation regarding the relation- 
ship between IQ and GPA. 

We could add a fourth or fifth dimension 
such as sex, age, college major, or “sociabil- 
itv’ to the above three-dimensional scatter- 
plot by using one of several colors of pencil 
to make the symbols, or to underline or en- 
circle them. The use of color for adding extra 
dimensions to the scatterplot is probably su- 
perior to the use of geometric symbols, num- 
bers, or letters, since the eye can detect color 
patterns quite readily. Holding a piece of 
colored cellophane before the eye might as- 
sist in detecting complex patterns when colors 
are used in the graph. 














showing interrelation 
high, middle, and low 


Fic. 1. Hypothetical data 
ships among GPA, IQ, and 
groups on a motivation test 


Of course, as in any system in which capi- 
talization on chance may be a factor, findings 
should be verified on a second sample before 
much reliance is placed upon them. Often it 
is necessary to count cases falling into cer- 
tain areas on the plot to verify visual impres- 
sions. Quantitative analysis will be desirable 
in many cases where rough estimates are in- 
adequate or where it is necessary to verify, to 
evaluate the extent or significance of, or sim- 
ply to summarize the findings based on the 
visual inspection. When used as a tool for 
pattern analysis, the graphic technique should 
be considered primarily a means for exploring 
the possibilities or a set of data. For obtain- 
ing evaluations of predictors, for comparison, 
or other purposes, one may wish to compute 
validity coefficients for certain segments of 
the sample. 

Multidimensional scatterplots may also be 
used to display data in situations where no 
sampling is involved. For instance, the writer 
once constructed a Naval Knowledge Test 
(NKT) which was to have as low as possible 
a correlation with the Navy General Classifi- 
cation Test (GCT) (Rimland, 1955). It was 
very difficult to find items which had good 
Y.«.xxr) Values yet were not too difficult nor 
too highly correlated with GCT. To facilitate 
item selection, the 7,; «cr, coefficients for the 
items were plotted along one axis, the 7,;, xx-r 
coefficients along the other axis, and the 
in the 
pool were entered at the proper points on this 
graph. The item numbers were written in dif- 
ferent colors to indicate the P-value levels of 


identification numbers of the items 





406 Bernard 
the items. Geometrical symbols were used to 
circumscribe each item number to identify 
which of several kinds of naval knowledge 
content the item was measuring. Clerical 
workers were able to plot this graph in less 
than 2 hours for the 144-item pool. It took 
the writer very little time to select the most 
desirable items from this simultaneous display 
of four kinds of item data. 

On investigating the reliability of the re- 
sulting 45-item NKT, on a new group of sub- 
jects, it was found that the odd-even correla- 
tion value was higher for a selected subgroup 
of high-aptitude recruits than for the total 
random sample of recruits from which the re- 
stricted range subgroup was drawn. Multidi- 
mensional scatterplotting was used to investi- 
gate this unusual phenomenon by plotting the 
“evens” and scores on the axis of a 
graph and entering the graph with Xs to in- 
dicate members of the curtailed group and Os 
to indicate the remainder of the sample. In- 
spection of the graph showed the unusual re- 
liability relationship to result from the high 
difficulty of the test. The Os formed a circular 
cluster at the lower left-hand corner while the 
Xs were distributed along the entire diagonal. 

Frequently making a three- or four-dimen- 
sional scatterplot involves no more effort than 
making a conventional two-dimensional one. 


“odds” 


If one is going to enter scores for boys and 
girls on the same scatterplot, it is usually no 
trouble to plot the boys scores as Xs, then 
switch to Os for plotting girls scores. If the 
boys and girls happen to be arranged in order 
of age, change from black pencil to red at the 
median point in each age list to obtain a four- 
dimensional scatterplot at a minimum of ex- 
pense in time and effort. (The four dimen- 
sions are the X and Y axes, age, and sex). 
The writer once saw a scatterplot in which 
scores on an aurally presented test were be- 
ing plotted against scores on a visually pre- 


Rimland 


sented test. The correlation was essentially 
zero. Later the researcher completed separate 
correlations for the men and women in his 
sample. The r for men was about .60, while 


that for women was about 


45. The present 
writer does not know if this striking finding 
was ever verified on a new sample, but the 
finding would have been immediately appar- 


ent had the researcher followed the procedure 
suggested here. 


SUMMARY 


\ graphic approach to portraying multidi- 
mensional relationships was discussed. It con- 
sists of entering coded symbols on a bivariate 
scatterplot. The technique is simple and easy 
to use and has been found valuable for sug 
gesting hypotheses and exploring relationships 
in complex sets of data. 

Among the suggested applications were: im- 
proving the validity of selection test batteries 
by revealing nonlinear relationships useful in 
predicting performance, displaying complex 
item data as an aid in test construction, and 
clarifying correlational relationships where re- 
sults inconsistent for 
subgroups of subjects. 


are several groups ot 


REFERENCES 


individuals in 


appl. Psychol., 


Guisettr, E. E. Differentiation of 
terms of their predictability. J 
1956, 40, 374-377 

Meent., P. } 
statistical methoc 

! 


pi iles. J 


with five 
f identifying psychotic MMPI 
counsel. Psychol., 1959, 6, 102-109 
Parrisu, J. A. A study of 
of combining predictor tests 
nel Res. Br. tech. res. Note, 
LAND, B. The Knowledge Test for en 
listed men: I. Development of the test. USN Bur 
Naval Personnel tech. Bull., 1955, No. 55-8. 
KR. I, 


4 comparison of clinicians 


} 


non-linear method 


USA TAGO Person- 
1959, No. 103 


two 


Naval 


HORNDIKI 


Wiley, 





Journal of Applied Psychology 
1960, Vol. 44, No. 6, 407-412 


THE VALIDITY OF BEHAVIORAL RATING 
FOR THE ASSESSMENT OF INDIVIDUAL 
WILLIAM D 


Oil Company, 


f 


In recent years great impetus has been 
given to the study of creativity, the impetus 
having arisen because of the increasing 
mands by academic and business organiza- 
tions for:some yardstick or criterion of crea- 
Sprecher (1959) 
has worked in the area of criterion develop- 


de- 


tive behavior or potential 


ment as it relates to creativity among engi- 
He points out that previous to his 
had permitted the 
judges to interpret the term creativity in 
their own way, that is, without 
a definition utilized for the purposes of the 


neers. 
study, no investigator 
recourse to 
study. Sprecher’s procedure permitted such 
an interpretation by allowing the judges to 
give reasons why the men they ranked highest 
in creativity differed irom those ranked low- 
est and also why the answers to open-ended 
engineering problems were judged to be crea- 
tive. In the report of his work the more pre- 
dominant reasons were presented in tabular 
form, but no attempt was made to investigate 
the validity of those reasons. The purpose of 
the study reported here was to allow judges 
to interpret creativity in behavioral terms so 
that the validity of their interpretations might 
be investigated. 


PROCEDURI 


labora 


ved in 


Twenty male research 
a major oil company wert 


haracte 


supervisors in the 
intervie 
they at- 
o those under their supervision. These in- 
formants, by the their departmental as- 

represented all research 
During the 
anonymously 


o the behavioral « ristics 
tributed 
nature | 
signments areas of the re 


center interview the informants 
were asked to 
terms, the 
man in their department. At 


definition of 


search 
describe, in behavioral 
creative man and the least 
no time during any of 


intro 


most creative 


the intervi Vas a itivity 


duced. 


1 The author wishes to thank H. L. Hemmingway 
and the other members of the research center staff 
who made this study possible. Thanks are ex- 
tended to A. E. Quade, Jr. for the development of 
the computer program necessary for the statistical 
analysis 


also 


SCALE ITEMS 
CREATIVITY 
BUEL 


Chicago, Illinoi 


Approximately 900 statements were collected thr 


om these 900 state- 


chosen which appeared to be 


the interviews described above. Fr 
143 were 
redundant with the other 
No effort was made to select 
guessed factor or type. These 143 
check list to be 


wherein the 


ments non- 


statements in the group 


(items) by 


Statements 
into a descriptive 
s ynnel 
to select for each 


man exactly) to 1 


evaluation instrument 
item a 
(does no 
as being descriptive of the 


This check list 


research personnel. To insure anon) 


} : 1 ] 
evaluating was emp 


and 


assigned 


mity 


courage objectivity a code letter was 


each rater and a code number to each ratee 


The 


cists 


ratees were male chemists, geologists, physi- 


various areas of specializa 


with 
each of 
engineering, 


and engineers 


tion subsidiary t these, that is, organi 
petroleum engineer 


held 


master’s degre 


hemistry, chemical 
Twenty-one percent of the rat 
degree, 12% held the 
held the bachelor’s 
no degree in one of the areas mentioned 
age of the 

Prior 


the raters 


ing, et¢ 
| 


doctors 


the 
ae 
“ 


( 


had training but 
The 


degree and 5% 


mean 


group was 39 rang ron 
a seminar was held with 
vach, discuss the neces- 
and t 
introduce 


ual rating, 
appre 
means of achieving 


to the act 
to describe the 


ity for and objectivity, 


inswer any questions which would not 
bias into the ratings. The 


of 13 of the 


Supe rvisors, 


rater group 
original informants and > research 
total 


Eighteen of the raters rated four persons 


; + 
t 


bringing th¢ number raters to 
each 
vhile the remaining two rated three persons each 

After all of 
rated by their 


statement 


personne ] h ad be en 

artificial 
statement 
non- 
The 


the 78 research 
supervisor, an 
This 
only after the 


completed 


immediate 
de V elope d 
but 


ratings had 


criterion was 
roughly 


definition 


defined creativity, 


} 


oriented peen 


statement read 


selected to be 
ther oil com- 
Company has agreed to let 
your new organization cet 
at The Pure Oil Company 
have been instructed by 
with those 
origi- 


You 


a esearch 


have been 


Hypothetically 
director of r 


The Pure Oil 
with vou to 


center for an 
pany 
vou take 

tain of the persons now 
You 
management to 
persons who will make the most significant 
nal and lasting contributions to research. In your 
choice of personnel try to focalize your evaluation 
on the three underlined characteristics above, dis- 
regarding the individual’s field of specialization. 


Research Center 


your new bring you 


director of 
the 


research, the assistant 


manager oO! 


The director of 


research, and the assistant research 


407 





William D. Buel 


TABLE 1 


TURE OF ITEMS VALID FOR THE ASSESSMENT 


OF INDIVIDUAL CREATIVITY 


Is conversant o1 
in his field 


Looks for new ways of doing things 


Stimulates the creativity in his associates 


Has represented his superiors in other departments 
of the research center 

Has expressed a desire to 
problen 


Has expressed a desi { 


e to be 
Uses the lates chni 


sues to solve 


assigned to him 
Has tackled problems others avo 
Participates in the activities of 

societies of his chosen fie 
Has supervised the work of 
specialization 
Puts diverse piece s of inf 

at a valid conclusion 
Has proposed entirely ipproacl 
Deve lops hypotheses Ss 
Seeks knowledge for 

his own past 

Has imp oved upon the 

superiors 


Follows instructior 


His approacl to ey 
Enthusiastic al 
Questions gene 


tt 


Has overlool 

Uses proces 

Has neglect 

\pproaches a pr m in tri 
His personal problems have 
Has ignored « 


Fails to sell his 


Is disdainful of 1 
His wr en reports include 


Watches the clock 


ntechnical personne 
+ 


irrelevant information 


His biases influence the objectivity 


interpretations 





Rating Scale for Individual Creativity 
TABLE 1 


Item 


Number \ , Numbe 


His writing s inds t pr lem is reporting 
upon 

Is preoccupied 

Has underestir 


project 


Can transfor 


Has called a 


ork of ot 


Has grasped 


stvmied other 
Is energetic 
Presents convincing 
of his point of viev 
Has expressed a desir 
management 
Comprehends a | 
Works overtime 
Identifies the logic 
} 


Has de velopec 


Asks for the rea 
changes 
Has questiones 


Openly seeks 


Fails to recognize 
Has chosen to usé¢ 
proven to be un} 
Has proposed ideas wl ‘ yntrary ( t 
rype | 
Is a technical pe rfectionist 


His literature searches are exhaustive 


I'ype G 


Interrupting his work disorganizes hin 


Has expressed a desire to work on only one 


problem at a time 


lype H 


Has expressed a cdlesire to remain at his present 
shail 


level of responsibility 


Has requested routine jobs 


Note Coefficients greater than .38 are significant at the .001 leve Coefficients greater tl 
01 level of confidence. All other coefficients are significant at t | 


hdence, 





410 


78 research personnel on this 
assumed that individually they 
had the best overall knowledge of the creativity of 
these people. The three criterion evaluations were 
done independently according to the following rou 
tine. Of the list of 78 persons provided to each cri 
terion rater (4 persons were later deleted because of 
unavailability of ratings), he chose the 15 whom he 
considered most creative in the light of the criterion 
statement. These persons he labeled A on a specially 
developed alphabetic list of names. After so labeling 
the first 15 persons, he ranked them within the A 
group. He next from the remaining 63 per- 
sons, the 16 whom he considered next most creative, 
all of whom he labeled B and ranked within their 
group. This procedure was continued until all per- 
sons were included in one of the five groups (A 
through E) and had been ranked within their group 
These letter con- 
verted into a continuous ranking across all 78 per- 
sons, for example, A-1 Rank 1, A-2 became 
Rank 2, and so on down to B-1 or Rank 16, B-16 
or Rank 31, etc 

Using the method proposed by 
ranked data 


center evaluated all 


basis, since it was 


chose, 


number combinations were easily 


became 


Hull 
into scores on a 
treatment by the 
formula. Prior to the 
of the Hull technique, adjustnfents 
rankings to 
ivailable sets of 


(1922), the 
converted linear 


amenable to 


were 
scale, 


product-m6 
ment correlation application 
made in the 
correct for the four un 
Interrater reliability coeffi- 
Rater 1 corr with 
Rater 3. Rater 2 correlated 
Using the r to z transformation for 
(Guilford, 
73 was obtained 


decided 


were 
three criterion 
data 
calculated elated .78 
68 witl 


cients were 
Rater 2 and 
with Rater 3 
averaging correlations 1950) an average 
In the light of this 
that the average of the 


scores for each man 


correlation of 
correlation it 
three linear scale 

his criterion score 


was 
would serve as 
Validities for the 143 check list items were calcu 
lated this IBM 705 


puter, using as the item scores the values (1 


against criterion by an com 
through 
5) assigned by each rater to each iter regard to 
Further, intercorrelations were 
Both 


product-moment coefficients of correlation, based on 


each ratee calculated 


among all items indices were in the form of 


74 cases 


Finally, an elementary linkage analysis was pet 


formed according to the method suggested by M« 


Linkage 
items in 


Quitty (1957) 


(clusters) of 


analysis produces “t 


essentially the sam« 
does cluster analysis utilizing the Holzinger a 
(Chandler, 1959). Since th 


revolved 


man B coefficients 


pose of the study around valid items (< 


level or better), a submatrix (59 X 59) was devel 


oped. Linkage analysis was performed on this re- 


duced matrix only up to and including the identifi 
cation of types, that is, no “typal relevancies” were 


computed. Because of the reduction of the matrix it 
is possible that a slight artificiality of type was crt 
ated, however, 


tirely of 


the types presented are made 


up en 
valid items 


William D. Buel 


RESULTS 


Table 1 presents the eight types of items 
found by the analysis of the submatrix. The 
items are numbered from 1 to 59 across the 
eight types, and are presented under their 
“typal” letter, no effort having been made to 
name the type. The item numbers were as- 
signed strictly on the basis of the magnitude 
of the absolute validity of the item within 
its type. In the first column after each item 
will be found its validity (ry). In the second 
column will be found the highest correlation 
(ric) of that item with some other item in 
the type. In the third column will be found 
the number of the item with which the item 
under consideration has the highest correla- 
tion. It should be noted that in each type two 
items are “reciprocal,” that is, have their 
highest correlations with each other. It should 
also be noted that several items may have 
their highest correlation with the same item, 
suggesting the existence of subtypes. 


DISCUSSION 


regard to 
Because, 
as previously mentioned, no effort was made 
to collect either positive items, negative items, 


mention in 
the results of the linkage analysis. 


Several things bear 


or items which corresponded to any guessed 
type, certain of the types appear to contain 
items which are roughly the inverse of items 
in some other type. Because the majority of 
the positively phrased items achieved positive 
validity and the majority of the negatively 
phrased items achieved negative validity, it 
would be next to impossible for a positively 
and negatively phrased item bearing on the 
same researcher characteristic to be included 
in the same type. With this in mind, it is pos- 
sible that only four or five types actually 
arose from the analysis. 

A review of some of the literature in the 
field shows corroboration of some of the find- 
ings reported here, some items gaining sup- 
port from the work of several persons. Roe’s 
work with eminent scientists (1952) 
support to Item 2 insofar as she discussses 
the characteristic researcher need to find 
things out for himself, to Item 17 as it re- 
lates to personal independence, and to Items 
19, 33, 36, 41, 45, 54, and 55 insofar as these 


lends 





Rating Scale for Individual Creativity 411 


items relate to a driving absorption in one’s 
work. Elsewhere (1956) she discusses the re- 
searcher’s lack of personal skills 
(Items 31 and 49), 

Barron (1955), in his investigation of “the 


relations 


disposition toward originality,’ discusses a 
preference for complexity in phenomena 
(Items 5, 40, and 59). He further cites evi- 
dence of dominance and the need for 
and recognition (Items 6 and 50) as well as 
evidence of independent behavior and a dis 
like for externally imposed control (Items 16, 
17, 48, and 49). (1957) Barron, 
in examining the relationship of originality to 
personality and intellect, speaks of creativity’s 
relationship to leadership (Items 10 and 43), 
of a “disposition toward integration of di- 
verse phenomena” (Item 11), of persuasive- 
(Items 27 and 42), of 
skills (Items 32, 35, and 42), of a tendency 
among unoriginal persons not to become in- 
volved in things 


power 


Elsewhere 


9 


ness communication 


(Item 58), of rigidity and 
inflexibility as characteristic of unoriginal per- 
sons (Item 18), of the relationship of unorigi- 
to vacillation and inability to make 
decisions (Item 25), of a high degree of in- 
tellect (Item 44), and of ascendance in rela- 
tions with others (Item 43) 

Kettner, Guilford, and Christensen (1959), 
in a factor analysis of reasoning, creativity, 
and evaluation, identify a factor of originality 
Items 2, 12, and 18 appear to bear on this 
factor. Further, they isolate a factor named 
eduction of conceptual relations” (Items 
11, 39, and 44). Wilson, Guilford, and Chris 
tensen (1953), in their study of 


nality 


individual 
differences in originality, appear to be deal- 
ing in terms similar to Items 2, 12, 18, and 
47. 

Items 16, 17, 48, and 49 seem to be 
supported by a study of factors related to 
success in technical jobs (Maschino, 1959) 
wherein, similar to the previous references to 
Barron and to Roe, the 
dependence as characteristic of the success- 
ful technical employee. Maschino further dis- 
successful techni- 


author identifies in- 


cusses a lack of anxiéty in 
cal personnel (Item 25). 
The work of the staff of the Employee Re- 
lations Research Section of the Standard Oil 
Company of Indiana (1959) points out the 
relationship of creativity to supervisory abil- 


ity (Item 10), to desire for responsibility 
(Items 43 and 58), and to an inclination to 
have many things on the fire at one time 
(Item 57). Elsewhere Owens, Smith, Albright, 
and Glennon (in press) discuss the hypothe- 
sized emergence of a factor named “self-con- 
fidence” (Items 5, 6, 8, 16, 43, and 50). 

Sprecher (1959), studying engineers’ cri- 
teria for creativity, presents reasons given for 
ranking engineers high or low in creativity. 
He also presents reasons why answers to open- 
ended engineering problems were considered 
creative. Although his reasons were not vali- 
dated, only those’ reasons bearing on number 
or volume of solutions (not represented by 
analogous items in the present study) remain 
unsupported as discriminators between crea- 
tive and noncreative persons. 

Aside from the discussed 
above, inspection of Table 1 raises some in- 
teresting considerations. As the reader can 
quickly determine, many of the items deal 
with personal efficiency or effectiveness. Pro- 
ductivity is no doubt reflected by some of 
the items. If such is the case, the question is 
raised as to how much effect “industrializa- 
tion” has on the supervisor and 
worker, that is, does their conception of crea- 
tivity become contaminated by what they feel 
management considers successful behavior, be 
it creative or otherwise? 


corroborations 


research 


In any research designed as this re- 
search, may become a major contami- 
nant. Certain items in Table 1, 
Items 17, 31, 36, 


was 
bias 
for example, 
and 49, are negatively 
phrased and yet achieved positive validity. 
Such items may be testimony to the fact that 
relatively little bias was operating when the 
ratings were performed. On the other hand, 
the fact that the majority of the positively 
phrased items achieved positive validities and 
the negatively phrased items achieved nega- 
tive validities may suggest bias. Un- 
doubtedly the contents of the individual item 
determines to a large degree whether or not 
bias will operate in the application of that 
item to the ratee. 


some 


In the seminar mentioned earlier it was de- 
cided that criteria such as patents issued, 
patents applied for, patent disclosures, and 
publications were not to be used to deter- 
mine item validities because certain areas of 





412 William 
the research center, although definitely en- 
gaged in creative research work, are not in an 
area where there is an opportunity to patent, 
etc. However, after the ratings and criterion 
information had been collected, curiosity led 
the investigator to determine the correlation 
between the criterion used and these four re- 
jected criteria. The data for these rejected 
criteria were collected for only one year 
(1958). Some error is thus introduced into 
these three patent criteria because of the 
time lag (several years in some cases) be- 
tween submitting a patent disclosure, filing a 
patent application, and the actual issuance 
of a patent. Also, the newer men are dis- 
criminated against by such a short data col- 
lection period. Considering these limitations, 
the criterion used correlates .42 with patent 
disclosures, .40 with patent applications, .29 
with patents issued, and .13 with publica- 
tions. Thé systematic decrease among the first 


three correlations appears to be accounted for 


by the progressive attrition of disclosures to 
applications and applications to patents issued 
In the light of these correlations, bias by cri- 
terion raters in favor of persons who patent, 
etc. would seem slight. The fact that the four 
rejected criteria do not correlate particularly 
well with the criterion used may be evidence 
relative to the inefficacy of patents and publi- 
cations as criteria for the assessment of the 
creative individual. 

Finally, independent research is needed to 
test the items presented here. Such research 
is represented by a study now going on in the 
research center of an unaffiliated organization 
with the purpose of determining the validity 
of these items when they are 
choice form. 


used in forced- 


SUMMARY 


Research supervisors in the laboratory of 
a large oil company were asked to anony- 
mously describe the most and least creative 
research men under their supervision, without 
recourse to a definition of creativity. The 
behavioral statements so obtained served as 
microdefinitions of creativity and were used 
as descriptive check list items to rate person- 


D. Buel 


nel in a wide variety of research activities. 
Validity coefficients and interitem correlation 
coefficients were computed for each item. 
Items demonstrated to be valid were sepa- 
rated into types through linkage analysis. 
Certain supporting studies were discussed in- 
sofar as they corroborate the results presented 
here. The relationship of the artificial cri- 
terion to certain commonly used criteria was 
discussed. Finally, it was suggested that the 
items presented may be valid discriminators 
between relatively more or less creative per- 
sons in a wide variety of research areas. 


REFERENCES 


Barron, F. 
abnorm. soc. 


The disposition toward originality. J. 

Psychol., 1955, 51, 478-485. 

Barron, F. Originality in relation to personality and 
intellect. J. Pers., 1957, 25, 730-742 

Cuanpier, R. E. An empirical comparison of link- 
age analysis with cluster and factor analysis 
Detroit: General Motors Corp., 1959. (Mimeo) 

GuILrorp, J. P statistics in psychol- 
ogy and education. New York: McGraw-Hill, 1950 

Hutt, C. L. The computation of Pearson’s r from 
ranked data. J. appl Psychol., 1922, 6, 385-390 

Kettner, N. W., Guiirorp, J. P., & C 
P. R. A factor analytic study across the domains 
of reasoning, creativity, and evaluation. Psychol 
Vonogr., 1959, 73(9, Whole No. 479) 

McQuitty, L. L. Elementary linkage analysis for 
isolating orthogonal and oblique types and typal 
relevancies. Educ. psychol. Measmt., 1957, 17, 
207-229. 

Mascuino, A. Factors related to success in technical 
Midland, Mich.: Dow 1959 
Mimeo) 
Owens, W. A., 


GLENNON, J 


Fundamental 


{RISTENSEN 


jobs. Chemical Co., 

SmitH, W. J., Avsricut, L. E., & 

R. The prediction of 
petence and creativity from pers 
appl. Psychol., in press 

Ror, A. A psychologist examines sixty-four eminen 
scientists. Scient. Amer., 1952, 187, 21-25. 

Ror, A. The psychology of occupations. New 
Wiley, 1956 

Sprecuer, T. B. A study of engineers’ criteria for 
creativity J. appl PF. ychol., 1959, 43, 141-148 

STANDARD Ot INDIANA, Central Em 
ployee Relations Department, Employee Relations 
Research 


research 
history J 


com 


ynal 


York 


COMPANY OI 

Section. Correlation of personal history 
with research performance. Chicago: Standard Oil 
Co. of Indiana, 1959. (Mimeo) 

Witson, R. C., Guirrorp, J. P., & CHRISTENSEN, 
P. R. The measurement of individual differences ir 
originality. Psychol. Bull., 1953, 50, 362-370 


(Received February 11, 


1960) 











CONTEMPORARY PSYCHOLOGY 


A Journal of 
Reviews 
Criticism 
Opinion 


No time to read? 


Let CP help with... 
Selective reviews of the latest books by specialists 


in the particular field involved. 


Comment by the Editor on news from the publishing 
world, on the printed word in particular and in 


general, on criticism, reviewing, and opinion. 


Feedback on controversial book reviews in a Letters- 


to-the-Editor section. 
Films, reviewed and listed. 
Lists of the latest books received. 


Put CP in your brief case and read it on planes, trains, 
buses. 


Keep in touch with the latest developments in your 
field of interest. 


1961 Subscription, $10.00 Single copy, 
(Foreign, $10.50) $1.00 


Send subscription orders to: 


AMERICAN PSYCHOLOGICAL ASSOCIATION 
Publications Office 
1333 Sixteenth Street, N. W. 
Washington 6, D. C. 

















North American 


Medical, Health 
& Welfare 


Telephone Directory 


from 


Atlantic 


to 
Pacific 
and 


Panama 
to 


Arctic 


names 
Addresses 


Telephone 
Numbers 
Officials 
Suppliers 

Institutional 
Organizations 
Organized Services 
Individual Practitioners 
etc etc etc 


Pre-publication cost 
Fifty Dollars with 
your order 


C. FERGUSON 
PO Box 173 


Calgary Alberta 
Canada 

















A listing of new and recent | 
books from McGraw-Hill ... 


THE MOTIVATION OF BEHAVIOR 


By Judson S. Brown, University of Florida. The McGraw-Hill Series in Psy- 
chology. Ready in January, 1961. 


INDUSTRIAL PSYCHOLOGY 


By B. von Haller Gilmer, Carnegie Institute of Technology. The McGraw-Hill 
Series in Psychology. Ready in January, 1961. 


ADJUSTMENT AND PERSONALITY 


By Richard S. Lazarus, University of California, Berkeley. The McGraw-Hill 
Series in Psychology. Ready in February, 1961. 


PSYCHOLOGY AND HUMAN DEVELOPMENT 


By Justin Pikunas and Eugene J. Albrecht, University of Detroit. Ready in 
January, 1961. 


MAN AND HIS NATURE: A Philosophical 


Psychology 
By James E. Royce, S.J., Seattle University. Ready in January, 1961. 


PERSONALITY ADJUSTMENT 


By Henry Clay Smith, Michigan State University. Ready in January, 1961. 


LEADERSHIP AND ORGANIZATION: 
A Behavioral Science Approach 


By Robert Tannenbaum, Irving R. Weschler, and Fred Massarik, University of 
California, Los Angeles. The McGraw-Hill Series in Management. Ready in 
January, 1961. 


GROUP GUIDANCE: PRINCIPLES AND 
PRACTICE 


By Jane Warters, University of Southern California. 448 pages, $6.25. 
Send for on-approval copies 


McGraw-Hill Book Company, Inc. 


330 West 42nd Street New York 36, N. Y. 
























































